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Dear Colleagues: 

The College Board is committed to conducting and disseminating research that supports and 
informs educators about the appropriate use of its assessments and services. The research also 
addresses critical issues in education. 

This catalog lists research reports, research notes and other publications available from the College 
Board’s website www.collegeboard.org/research/home. The catalog briefly describes research 
publications available free of charge. Introduced in 1981, the Research Report series includes 
studies and reviews in areas such as college admission, special populations, subgroup differences, 
postsecondary readiness and success, and learning and cognition. Extensive research on specific 
College Board programs such as the SAT, the Advanced Placement Program, PSAT/NMSQT, and 
ACCUPLACER is provided. Many historical reports, statistical reports, data tables, and policy 
reports are also available. 

I hope you find this catalog and our library of online materials and resources useful and informative. 

Wayne J. Camara 
Vice President, 

Research and Development 
The College Board 
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Research Reports 

Predicting College 
Performance 

PSAT/NMSQT® Indicators of College Readiness 

Thomas P. Proctor, Jeffrey Wyatt, and Andrew Wiley 
This report presents a methodology for the creation 
of a PSAT/NMSQT® test score benchmark to identify 
students who are on track toward college readiness when 
completing high school. The proposed benchmark could 
create useful early indicators of whether students in 
grades 10 and 11 are on track to be college ready upon 
high school graduation. 

RR No. 2010-4 Item No.: 10b-2587 8 pgs 

2010 $15 

A Comparison of College Performance of 
Matched AP® and Non-AP Student Groups 

Daniel Murphy and Barbara Dodd 

The purpose of the research was to compare the college 
performance of three groups of AP® students who took 
the AP Exam and either earned course credit, did not 
earn course credit, or earned course credit but elected 
to take the entry-level college course to three groups of 
Non-AP student groups matched on SAT® scores and 
high school rank in 10 AP subject areas. In addition, 
the performance of the AP groups was also compared 
to matched groups of students who were concurrently 
enrolled in a college course in the same subject area 
as the AP students. Students’ records for four entering 
classes (1998-2001) at the University of Texas at Austin 
were analyzed. The results showed AP students who earn 
course credit consistently outperform their matched 
Non-AP group on most of the college outcome measures. 
RR No. 2009-6 Item No.: 09b-644 46 pgs 

2009 $15 


The Relationship Between AP Exam 
Performance and College Outcomes 

Krista D. Mattern, Emily J. Shaw, and Xinhui Xiong 
This study focused on the relationship between students’ 
performance in AP English Language, Biology, Calculus, 
and U.S. History, and their subsequent college success. 
For each AP Exam studied, students were divided into 
three groups according to their AP Exam performance 
(no AP Exam taken, score of 1 or 2, and a score of 3 or 
higher). Subsequent college success was measured by 
students’ first-year college grade point average (FYGPA), 
retention to the second year, and institutional selectivity. 
Results indicated that, even after controlling for students’ 
SAT scores and high school grade point average as 
measures of prior academic performance, students with 
an AP score of 3 or higher outperformed the other two 
groups. Additionally, students with an AP score of 1 or 2 
tended to outperform students with no AP scores except 
in terms of FYGPA. 

RR No. 2009-4 Item No.: 09b-269 15 pgs 

2009 $15 

Socioeconomic Status and the Relationship 
Between the SAT® and Freshman GPA: 

An Analysis of Data from 41 Colleges and 
Universities 

Paul R. Sackett, Nathan R. Kuncel, Justin J. Arneson, 

Sara R. Cooper, and Shonna D. Waters 

Critics of educational admissions tests assert that tests 
measure nothing other than socioeconomic status 
(SES), and that their apparent validity in predicting 
academic performance is an artifact of SES. We examine 
relationships among SAT, SES, and freshman grades 
in 41 colleges and universities and show that (a) SES is 
related to SAT scores (r = 0.42 among the population of 
SAT takers), (b) SAT scores are predictive of freshman 
grades (r = 0.47 corrected for school- specific range 
restriction), and (c) statistically controlling for SES 
reduces the estimated SAT-grade correlation from 
r = 0.47 to r = 0.44. Thus, the vast majority of the SAT- 


grade relationship is independent of SES: The SAT-grade 
relationship is not an artifact of common influences of 
SES on both test scores and grades. 

RR No. 2009-1 Item No.: lib-3396 14 pgs 

2009 $15 

A Comparison of College Performances of AP 
and Non-AP Student Groups in 10 Subject Areas 

Leslie Keng and Barbara G. Dodd 

This study sought to compare the peformance of students 
in the College Board Advanced Placement Program® 
(AP) compared to non-AP students on a number of 
college outcome measures. Ten individual AP Exams 
were examined in this study of students in four entering 
classes (1998-2001) at the University of Texas at Austin. 
The study’s results support previous research that AP 
students performed as well if not better than non-AP 
students on most college outcome measures. 

RR No. 2008-7 Item No.: 0480482807 20 pgs 

2008 $15 

Validity of the SAT for Predicting First-Year 
College Grade Point Average 

Jennifer L. Kobrin, Brian F. Patterson, Emily J. Shaw, 
Krista D. Mattern, and Sandra M. Barbuti 
This report presents the results of a large-scale national 
validity study of the SAT. The results show that the 
changes made to the SAT did not substantially change 
how well the test predicts first-year college performance. 
Across all institutions, the recently added writing section 
is the most highly predictive of the three individual SAT 
sections. As expected, the best combination of predictors 
of first-year college grade point average is high school 
grade point average and SAT scores. 

RR No. 2008-5 Item No.: 080482568 10 pgs 

2008 $15 

College Outcomes Comparisons by AP and 
Non-AP High School Experiences 

Linda Hargrove, Donn Godin, and Barbara Dodd 
Performance was examined for five cohorts of 1998-2002 
Texas public high school graduates through their first 
year and 1998-2001 cohorts through their fourth year 
of Texas public higher education. Student performance 
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on college outcomes included (a) first- and fourth- 
year grade point averages, (b) first- and fourth-year 
credit hours earned, and (c) four-year graduation status. 
Outcomes were compared across students who varied 
by three types of AP (course only, exam only, and 
both course and exam) and two types of non-AP (dual 
enrollment only and other course only) experiences in 
high school. 

RR No. 2008-3 Item No.: 080482548 51 pgs 

2008 $15 

Predicting Grades in Different Types of College 
Courses 

Brent Bridgeman, Judith Pollack, and Nancy Burton 

The ability of high school grades (high school GPA) and 
SAT scores to predict cumulative grades in different 
types of college courses was evaluated in a sample of 26 
colleges. Each college contributed data from three cohorts 
of entering freshmen, and each cohort was followed for 
at least four years. Colleges were separated into four 
levels by average SAT scores. Grade point averages for 
four categories of courses (English; science, math, and 
engineering [S/M/E]; social science; and education) were 
computed, and analyses were run separately for gender 
within race/ethnicity classifications. 

RR No. 2008-1 Item No.: 080482408 27 pgs 

2008 $15 

The SAT as a Predictor of Different Levels of 
College Performance 

Jennifer L. Kobrin and Rochelle S. Michel 
This study explores one of the most persistent questions 
regarding the validity of the SAT: Does the SAT add 
substantially to the prediction of college success after 
high school grades are taken into account? The results 
of the study found that the SAT had an equal or slightly 
greater predictive power than high school grade point 
averages. 

RR No. 2006-3 Item No.: 060481783 10 pgs 

2006 $15 


The College Board SAT Writing Validation 
Study: An Assessment of Predictive and 
Incremental Validity 

Dwayne Norris, Scott Oppler, Daniel Kuang, Rachel Day, 
and Kimberly Adams 

This study assessed the predictive and incremental 
validity of a prototype version of the new SAT writing 
section that was administered to a sample of incoming 
students at 13 colleges and universities. For these 
participants, SAT scores, high school GPA, and first-year 
grades also were obtained. Using these data, analyses were 
conducted to assess the validity of SAT writing scores for 
predicting first-year college GPA and GPA in English 
composition courses. Consistent with the results of prior 
research, the weighted-average correlation between SAT 
writing scores and first-year college GPA was 0.46 when 
corrected for range restriction. Furthermore, the SAT 
writing scores resulted in a weighted-average increment 
of 0.01 to the predictive validity already provided by SAT 
verbal and math scores and high school GPA in predicting 
first-year college GPA. Also consistent with previous 
research, the weighted-average correlation between SAT 
writing scores and GPA in English composition was 0.32 
when corrected for range restriction. 

RR No. 2006-2 Item No.: 060481782 31 pgs 

2006 $15 

Understanding What SAT Reasoning Test™ 
Scores Add to High School Grades: A 
Straightforward Approach 

Brent Bridgeman, Judy Pollack, and Nancy Burton 
Using a sample of 41 colleges, this study shows substantial 
differences in the percent of students who succeed 
(defined by a 2.5 or 3.5 college grade point average at 
the end of one year or four years in college) by SAT score 
level, even when intensity of the high school curriculum 
and high school grades are taken into account. 

RR No. 2004-4 Item No.: 040481304 20 pgs 

2004 $15 


Effect of Fewer Questions per Section on SAT I 
Scores 

Brent Bridgeman, Catherine Trapani, and Edward Curley 
The impact on SAT I: Reasoning Test scores of allowing 
more time for each question was estimated by reducing 
the number of questions into the standard 30-minute 
equating section of two national test administrations. 
Thus, for example, questions were deleted from a verbal 
section that contained 35 questions to produce forms 
that contained 27 or 23 questions. Scores on the 23- 
question section could then be compared to scores on the 
same 23 questions when they were embedded in a section 
that contained 27 or 35 questions. Similarly, questions 
were deleted from a 25-question math section to form 
sections of 20 and 17 questions. Allowing more time 
per question had a minimal impact on verbal scores, 
producing gains of less than 10 points on the 200-800 
SAT scale. Gains for the math score were less than 30 
points. High-scoring students tended to benefit more 
than lower-scoring students, with extra time creating no 
increase in scores for students with SAT scores of 400 or 
lower. Ethnic/racial and gender differences were neither 
increased nor reduced with extra time. 

RR No. 2003-2 Item No.: 996735 16 pgs 

2003 $15 

Predictive Validity of SAT I: Reasoning Test 
for Test-Takers with Learning Disabilities and 
Extended Time Accommodations 

Cara Cahalan, Ellen B. Mandinach, and 
Wayne J. Camara 

The predictive validity of the SAT I: Reasoning Test 
was examined for students who took the test with an 
extended time accommodation for a learning disability. 
The sample included college students with learning 
disabilities who took the SAT I between 1995 and 1998 
with extended time accommodations. First-year grade 
point average (FGPA) was used as a measure of student 
performance. Although positive, the adjusted correlation 
between FGPA and SAT scores was lower for test-takers 
with a learning disability than has been shown in prior 
research on test-takers without disabilities. 

RR No. 2002-5 Item No.: 994216 12 pgs 

2002 $15 
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Using Achievement Tests/SAT II: Subject Tests to 
Demonstrate Achievement and Predict College 
Grades: Sex, Language, Ethnic, and Parental 
Education Groups 

Leonard Ramist, Charles Lewis, and 
Laura McCamley-Jenkins 

There has been increased interest in emphasizing 
Achievement Tests, as SAT II: Subject Tests, for use 
in admission and placement. In this report, data were 
obtained from a comprehensive database of categorized 
course grades for a large number and great variety of 
colleges, with student groups identified. For each student 
group, the percentage of SAT takers who took any 
Achievement Test and the percentage of Achievement 
Test takers who took each specific test are determined. 
The performance of those who took each Achievement 
Test is compared with the performance of the same 
students on the verbal section of the SAT (for English, 
history, and foreign language tests), the mathematical 
section of the SAT (for mathematics tests), or the sum 
of the verbal and mathematical scores on the SAT 
(for science tests and the average of all of a student’s 
Achievement Test scores). The predictive effectiveness 
of each Achievement Test is determined for predicting 
freshman grade point average, alone and in combination 
with high school grade point average and SAT scores, 
and for predicting grades in each kind of course. Finally, 
one aspect of fairness of each Achievement Test for each 
student group is evaluated in terms of average over- and 
underpredictions. 

RR No. 2001-5 Item No.: 992620 84 pgs 

2001 $15 

An Analysis of Advanced Placement* 
Examinations in Economics and Comparative 
Government and Politics 

Hunter M. Breland and Philip K. Oltman 
Advanced Placement Program (AP) Examinations in 
Macroeconomics, Microeconomics, and Comparative 
Government and Politics were studied to examine 
college course performance and gender differences. It 
was concluded that students who had received college 
credit for AP performed as well or better in higher-level 
college courses in Macroeconomics and Microeconomics 


than students who had not taken AP courses. Gender 
differences in performance were observed in all three 
examinations, but the greatest gender differences, 
favoring male students, were observed for Comparative 
Government and Politics. However, a survey of 
instructors of Comparative Government and Politics 
indicated that only a small percentage of instructors 
had observed gender differences in performance in their 
courses. 

RR No. 2001-4 Item No.: 992583 31 pgs 

2001 $15 

Predicting Success in College: SAT Studies of 
Classes Graduating Since 1980 

Nancy W. Burton and Leonard Ramist 

Studies predicting success in college for students 
graduating since 1980 are reviewed. SAT scores and 
high school records are the most common predictors, 
but a few studies of other predictors are included. The 
review establishes that SAT scores and high school 
records predict academic performance, nonacademic 
accomplishments, leadership in college, and postcollege 
income. The combination of high school records and 
SAT scores is consistently the best predictor. Academic 
preadmission measures contribute substantially to 
predicting academic success (grades, honors, acceptance 
and graduation from graduate or professional school); 
contribute moderately to predicting outcomes with both 
academic and nonacademic components (persistence 
and graduation); and make a small but significant 
contribution to predicting college leadership, college 
accomplishments (artistic, athletic, business), and 
postcollege income. A small number of studies of 
nonacademic predictors (high school accomplishments, 
attitudes, interests) establish their importance, 
particularly for predicting nonacademic success. 

RR No. 2001-2 Item No.: 990299 32 pgs 

2001 $15 


Predictions of Freshman Grade-Point Average 
From the Revised and Recentered SAT I: 
Reasoning Test 

Brent Bridgeman, Laura McCamley-Jenkins, and 
Nancy Ervin 

The impact of revisions in the content of the SAT and 
changes in the score scale on the predictive validity 
of the SAT were examined. Predictions of freshman 
grade point average (FGPA) for the entering class of 
1994 (who had taken the old SAT) were compared with 
predictions for the class of 1995 (who had taken the new 
SAT I: Reasoning Test). The 1995 scores were evaluated 
both on the original SAT Program scale and on the 
recentered scale introduced that year. The changes in 
the test content and recentering of the score scale 
had virtually no impact on predictive validity. Other 
analyses indicated that the SAT I predicts FGPA about 
equally well across different ethnic groups. 

RR No. 2000-1 Item No.: 987443 16 pgs 

2000 $15 

Effects of Extended Time on the SAT I: 

Reasoning Test Score Growth for Students with 
Learning Disabilities 

'Wayne J. Camara, Tina Copeland, and Brian Rothschild 
Tests administered with accommodations to persons 
with disabilities have been considered nonequivalent 
to tests administered under standardized conditions to 
nondisabled test-takers. This study examined the score 
change patterns for learning disabled students completing 
extended-time administrations of the SAT I: Reasoning 
Test in comparison to nondisabled students retesting 
under standard-time administrations. Results illustrate 
that learning disabled students generally performed 
about .5 of a standard deviation below nondisabled 
test-takers. However, the mean score gain for learning 
disabled students first completing a standard-time SAT 
and retesting under an extended-time SAT was more 
than three times as large as the mean score gain for 
both nondisabled students testing under standardized 
conditions and learning disabled students testing with 
extended time on both occasions. 

RR No. 98-7 Item No.: 050481642 18 pgs 

1998 $15 
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Student Group Differences in Predicting College 
Grades: Sex, Language, and Ethnic Groups 

Leonard Ramist, Charles Lewis, and 
Laura McCamley-Jenkins 

Part 1 of this study investigated possible causes of the 
observed decline in correlations between SAT scores 
and freshman grade point average (FGPA). Working 
with a database of 38 colleges, the study found that the 
comparability of course grades received by entering 
freshmen declined in the 1980s. Three new measures 
of grade comparability — variety of courses taken, 
variation in average student aptitude among courses, 
and appropriateness of average course grade in relation 
to student aptitude level — proved to be excellent 
indicators of both the level of and the change in SAT 
validity for predicting FGPA. Using course grade as 
a criterion instead of FGPA reduced the decline in 
both SAT and high school GPA (F1SGPA) validity for 
predicting course grades by 40 percent. Contrary to the 
assumption that high school record (FISR) is a better 
predictor than the SAT, compared with HSR the SAT 
had higher or equal average validities for predicting 
course grades in almost all categories of courses. Part 
2 examined course selection, grading patterns, grade 
comparability, SAT predictive effectiveness, and mean 
over- and underprediction across different courses for 
groups based on gender, English as best or not best 
language, and ethnicity. All results were analyzed by 
college selectivity level and size. 

RR No. 93-1 Item No.: 217845 41 pgs 

1993 $15 

Performance and Persistence: A Validity Study 
of the SAT for Students with Disabilities 

Marjorie Ragosta, Henry Braun, and Bruce Kaplan 
This study was designed to test the validity of the SAT 
in predicting overall performance and persistence in 
college of students with disabilities, especially those 
participating in special test administrations. An earlier 
validity study (Braun, Ragosta, and Kaplan 1986) used 
first-year grade point averages (FGPA) in college to 
study the validity issue. The current study returned 
to the schools that had originally provided data and 
obtained information on overall grade point averages 


and graduation status. Overall college grade point 
averages of both disabled and nondisabled students were 
well predicted by SAT scores alone or in conjunction 
with high school grades. SAT scores from special test 
administrations did an adequate job of predicting college 
performance, although there was slight overprediction 
for some groups of disabled students. 

RR No. 91-3 Item No.: 217838 27 pgs 

1991 $15 

Sex Differences in SAT Predictions 
of College Grades 

Lawrence ]. Strieker, Donald A. Rock, and 
Nancy W. Burton 

This study examined the impact of gender differences 
in the nature of grades and other variables associated 
with academic performance and prediction of college 
grades by the SAT. This study of an entire freshman class 
at a large state university found women’s GPA slightly 
underpredicted by the SAT. Adjusting the GPA for 
differences in grading standards for individual courses 
did not affect the underprediction, but controlling for 
gender differences in individual-difference variables 
concerned with academic preparation, studiousness, 
and attitudes about mathematics reduced or essentially 
eliminated it. 

RR No. 91-2 Item No.: 217836 49 pgs 

1991 $15 

Analysis of the Predictive Validity of the SAT 
and High School Grades from 1976 to 1985 

Rick Morgan 

This study examines predictive validity studies from 
a 10-year period during which the average correlation 
coefficients between the SAT and college grades had 
shown a small, but consistent, downward trend. In order 
to take into account the mix of different institutions 
in each year of validity data, Morgan conducted most 
of his analysis on a subgroup of institutions that had 
conducted validity studies in at least two different years 
during this period. Variations in the range of student 
abilities at these institutions were accounted for by the 
use of a multivariate restriction of range adjustment. The 
analysis of colleges conducting multiple studies found 


that the estimates of change in the correlation of SAT 
scores with freshman grade point average (FGPA) are 
smaller than the initial yearly averages indicated. It was 
concluded that the decline is not well characterized by 
simple comparisons of average correlations based on the 
total self-selected population of colleges participating in 
the Validity Study Service from one year to another. 

RR No. 89-7 Item No.: 295745 16 pgs 

1989 $15 

Generalization of SAT Validity Across Colleges 

Robert F. Boldt 

This study, which focused on the validity of the SAT-V 
and SAT-M, used data from 99 validity studies conducted 
by the Validity Study Service of the College Board. In 
addition to examining test validities based on first-year 
college grade point averages (FGPA), validities for each 
college were also estimated for applicants for admission 
to the colleges and all SAT takers. These latter two 
estimates were based on range restriction theory. The 
results revealed that the average validity of both the 
SAT-V and SAT-M was estimated to be higher for all test- 
takers and for groups of applicants than for test-takers 
on whom validity studies were based. It was also found 
that negative, or other very low SAT-validity coefficients, 
should be regarded with suspicion, since they might have 
arisen from using small samples, restriction of range of 
test scores, or the unreliability of the criterion in validity 
studies. 

RR No. 86-3 Item No.: 275891 12 pgs 

1986 $15 

Predicting Predictability: The Influence of 
Student and Institutional Characteristics on the 
Prediction of Grades 

Leonard L. Baird 

This report describes a study that examined the statistical 
and institutional influences on first-year college grades. 
Data came from the Validity Study Service file, which 
summarized the results of College Board validity studies, 
and the College Handbook file, which included data 
about college characteristics. The criterion was the size 
of the multiple correlation between academic predictors 
and first-year college grades. The independent variables 


4 


were the statistical data of the validity study and college 
characteristics. In general it was found that the extent of 
the variation of the academic ability of the students was 
positively related to the size of the multiple correlation. 
Several variables also suggested the interpretation that 
the heterogeneity of the programs and experience of 
college were negatively related to the size of the multiple 
correlation. 

RR No. 83-5 Item No.: 275873 11 pgs 

1983 $15 

A Review of Research on the Prediction of 
Academic Performance After the Freshman Year 

Kenneth M. Wilson 

The criterion most frequently used in studies designed to 
assess the predictive validity of measures used in college 
admission has been the freshman-year grade point 
average (FGPA). This is a report of a systematic review 
of research bearing on: (a) the validity of admission 
measures for predicting GPA beyond the freshman year, 
i.e., longer-term cumulative or independently computed 
postfreshman GPA, such as senior-year GPA, and (b) 
the comparative relevance and utility of freshman- 
year, cumulative, and independently computed 
postfreshman-year GPA as criteria for the validation of 
admission measures. Among its findings, the research 
lends support to the traditional practice of employing 
the freshman-year GPA in admission-related predictive 
validity studies. 

RR No. 83-2 Item No.: 275870 43 pgs 

1983 $15 

Older Students and the SAT 

Patricia Lund Casserly 

This report studied the predictive validity of the SAT for 
older students at three universities, and students’ reactions 
to the admissions processes they had completed. Analysis 
supported the use of the SAT with local prediction 
equations for older students. Interviews with older 
students suggested that their range of circumstances 
requires a sensitive use of any admissions instrument — 
and effective counseling and placement. 

RR No. 82-8 Item No.: 275868 

1982 $15 
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College Placement 

AP Students in College: An Analysis of Five-Year 
Academic Careers 

Rick Morgan and John Klaric 

The purpose of the study was to explore the academic 
careers of students who took AP Exams and to compare 
their careers with those who did not take AP Exams. 
For most AP Exams, students with AP grades of 3 or 
better had higher grade averages in intermediate college 
courses than did non-AP students who first took an 
introductory course. 

RR No. 2007-4 Item No.: 070482287 16 pgs 

2007 $15 

Setting Cut Scores for College Placement 

Deanna L. Morgan and Michalis P. Michaelides 

Due to the high stakes that may be attached to placement 
decisions, it is imperative that the placement process 
be as solid and defensible as possible. An integral 
part of the placement process is the identification and 
use of cut scores, the point(s) on the score scale that 
classify students into adjacent categories for placement 
decisions. This report is geared toward helping college 
administrators make valid decisions regarding setting 
cut scores, focusing particularly on selecting a method, 
but also discussing issues such as defining performance 
levels and validating the process. 

RR No. 2005-9 Item No.: 050481692 12 pgs 

2005 $15 

An Investigation of Educational Outcomes for 
Students Who Earn College Credit Through the 
College-Level Examination Program* 

Nancy K. Scammacca and Barbara G. Dodd 
This study investigated the educational outcomes of 
the College-Level Examination Program* (CLEP) for 
students who earned credit through CLEP compared to 
those students who earned comparable credit through 


the Advanced Placement Program (AP) and through 
traditional course enrollment. Results indicated that 
CLEP students did as well as, or better than, those in the 
comparison groups in nearly every case. 

RR No. 2005-5 Item No.: 050481412 22 pgs 

2005 $15 

Feasibility of Elsing the SAT in Academic 
Guidance 

Lawrence J. Strieker, Donald A. Rock, and 
Nancy W. Burton 

This study appraised the validity of SAT scores, grades 
in high school courses, and the number and difficulty 
level of these courses for predicting college grades 
in various fields of study. The objective of the study 
was to provide SAT takers with predictions of their 
academic performance in different academic fields for 
guidance purposes. The possible impact of this feedback 
on the flow of students into specific major fields was 
also assessed. Data on an entering class at a large state 
university provided the basis for this study. It was found 
that the SAT and other variables based on high school 
performance predicted college grades in different fields 
of study by taking into account marked variations in 
grade distributions among the fields. These predictions 
of letter grades could be potentially useful to students 
in making decisions about college courses and majors. 
Another important finding was that students’ predicted 
grades in the different fields and their intended majors 
were virtually unrelated. 

RR No. 95-1 Item No.: 219095 10 pgs 

1995 $15 

Prediction of Grades in College Mathematics 
Courses as a Component of the Placement 
Validity of SAT-Mathematics Scores 

Brent Bridgeman and Cathy Wendler 
This study examined the placement validity of the SAT-M 
for specific college mathematics courses. The predictive 
validity of SAT-M was evaluated by comparisons to grades 


in freshman mathematics courses from 10 colleges. 
Considering the relatively low correlations, the test 
content coverage (no advanced algebra or trigonometry), 
and the timing of the test (often administered near or 
before the beginning of the senior year in high school), 
the most reasonable use of SAT scores for placement may 
be as a preliminary screening instrument. High-scoring 
students may well be exempted from basic mathematics 
courses, but students scoring below the cutoff should 
be given another opportunity to demonstrate their 
competence at a time closer to their first semester in 
college. 

RR No. 89-9 Item No.: 217808 36 pgs 

1989 $15 

The Effectiveness of the College Board’s Test of 
Standard Written English for Placing Students 
in Entry-Level English Courses 

Rex Jackson, Jeanette Morgan, and Gerald Osborne 
The effectiveness of the Test of Standard Written English 
(TSWE) as an aid in placing students in introductory 
English courses was studied by relating test scores to 
course outcomes for students entering the University of 
Houston during a five-year period. For students placed 
in the regular introductory English composition course, 
TSWE scores were effective in predicting end-of-course 
grades, essay scores, and test scores. Students with scores 
below 40 on the TSWE were estimated to have less than 
58 chances in 100 of obtaining grades of C or higher 
in the regular course. For students who were placed in 
and completed the remedial course, large gains were 
found from precourse TSWE scores to postcourse TSWE 
scores. 

RR No. 86-1 Item No.: 275890 10 pgs 

1986 $15 
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The Validity of the Descriptive Tests of 
Language Skills: Relationships to Direct 
Measures of Writing Ability and to Grades in 
Introductory College English Courses 

David Weiss and Rex Jackson 

A pilot study was designed to permit several checks 
on the validity of the Descriptive Tests of Language 
Skills (DTLS). Several types of criterion data were 
collected, including English course grades, scores on 
essays administered concurrently with the DTLS and 
prior to course enrollment, and scores on end-of-term 
essays. The relationship of DTLS scores to these criteria 
provided evidence of the utility and validity of DTLS 
scores for their placement in college English courses. 

RR No. 83-4 Item No.: 275872 14 pgs 

1983 $15 
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High School Curriculum and 
Grades 

Relationships Between PSAT/NMSQT Scores 
and Academic Achievement in High School 

Glenn B. Milewski and Ellen A. Sawtell 

This study investigated relationships between scores 
on the verbal, mathematics, and writing sections of the 
PSAT/NMSQT and the following indicators of academic 
achievement in high school: years of study, participation 
in specific math and English language arts courses, high 
school grade point average, academic intensity, and 
participation and performance in Advanced Placement 
Program courses. The results showed that there are 
moderate to strong relationships between indicators of 
academic achievement in high school and PSAT/NMSQT 
scores. 

RR No. 2006-6 Item No.: 060481916 16 pgs 

2006 $15 

A Survey to Evaluate the Alignment of the New 
SAT Writing and Critical Reading Sections to 
Curricula and Instructional Practices 

Glenn B. Milewski, Daniel Johnsen, Nancy Glazer, and 
Melvin Kubota 

This report presents the results of a large-scale, national, 
reading and writing curriculum survey and evaluates 
the alignment of the survey results to the reading and 
writing skills measured by the new SAT. The results 
demonstrate a strong link between the skills measured 
by the new SAT and high school and college curricula 
and instructional practice. 

RR No. 2005-1 Item No.: 040481374 31 pgs 

2005 $15 

Whose Grades Are Inflated? 

Wayne ]. Camara, Ernest Kimmel, Janice Scheuneman, 
and Ellen A. Sawtell 

There is clear evidence that the average grades earned in 
high school have been going up for some period of time. 
This study examines the question of whether students of 
varying backgrounds have experienced similar increases 


in grade point average (GPA) over a 25-plus-year period. 
Changes in SAT verbal and mathematical scores for the 
same gender and racial/ethnic groups are also examined. 
Trends in the grading practices of major subjects in the 
high school curriculum are presented, as are changes 
in the GPA and test scores for students clustered by the 
type of community in which their school is located and 
whether it is public or nonpublic. 

RR No. 2003-4 Item No.: 30481021 46 pgs 

2003 $15 

The College Board Vocabulary Study 

Hunter M. Breland, Robert J. Jones, and Laura Jenkins 

This study was conducted to provide data on the word 
frequency of different types of reading materials to 
which high school and first-year college students are 
exposed. It began with a comprehensive listing of reading 
materials from curriculum surveys, state curriculum 
guides, private school reading lists, research surveys, 
federal reports, recommended reading lists, and other 
sources. Materials were sampled or entire documents 
were obtained when they were available in electronic 
form. A corpus of 14,360,884 words of running text was 
assembled. This report describes the development of 
the corpus and the computation of the word frequency 
indexes. 

RR No. 94-4 Item No.: 271539 51 pgs 

1994 $15 

College Grades: An Exploratory Study of 
Policies and Practices 

Ruth B. Ekstrom and Ana Maria Villegas 
This report summarizes the grading policies of 14 colleges 
and universities and how those policies have changed between 
1980 and 1990. Grading policies and practices in the business, 
chemistry, education, English, history, mathematics, and 
psychology departments at these institutions and the grading 
orientation and practices of faculty are also summarized. 
The report concludes that there appears to be pressure on 
institutions of higher education and their faculties to reduce 
what the public perceives as lax standards that result in rising 
grade point averages. 

RR No. 94-1 Item No.: 218192 33 pgs 

1994 $15 


An Examination of the Relationships of 
Academic Coursework with Admissions Test 
Performance 

Rick Morgan 

The redesigned Student Descriptive Questionnaire 
(SDQ) provides a great deal of background information 
about examinees sitting for the SAT. One set of questions 
focuses on the number of years and types of courses in 
the students’ academic backgrounds. This information 
makes it possible to explore the relationships between 
course work and performance on the SAT. This 
study used data from the 1987 National Sample Tape, 
which contains SDQ responses and score information 
from 100,000 seniors in the class of ’87. The analysis 
examined the relationships between both the SAT and 
Achievement Test scores and the type and level of high 
school course work in six academic areas. To provide a 
more accurate representation of these relationships, the 
data were adjusted to account for differences related to 
student academic achievement. The results showed that 
course work in the disciplines of mathematics, natural 
science, and foreign languages has the strongest adjusted 
relationships with SAT scores, and the specific course 
relationships appear stronger for male than for female 
examinees. 

RR No. 89-6 Item No.: 295741 37 pgs 

1989 $15 

Surveys of the Use of Hand Calculators and 
Microcomputers in College-Preparatory and 
College Science Classes 

G. Will Pfeiffenberger and Ann Marie Zolandz 
The availability of relatively inexpensive hand calculators 
and microcomputers is believed to be having an 
important effect on the teaching of mathematics and 
science. Therefore, the appropriateness of allowing the 
use of hand calculators on standardized tests has become 
a concern of both educators and organizations involved 
in testing, such as the College Board and ETS. Also of 
interest is the potential for tests that could be delivered 
using microcomputers. The current study utilized a 
survey of secondary school and postsecondary faculty 
to collect information on classroom practices and the 
opinions of teachers at the secondary and postsecondary 
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levels. Results can help inform decisions about calculator- 
use policies for standardized tests and about the possible 
incorporation of microcomputers into these tests. 

RR No. 89-4 Item No.: 295731 121 pgs 

1989 $15 

The Impact of Secondary School Honors-Type 
Courses on College-Level Performance 

Donald G. Dickason 

There have long been differences of opinion on the 
predictive value of secondary school honors-type courses 
in the college admission process. This study disproves 
the proposition that an honors-type course grade should 
be promoted one full level (e.g., from a B to an A), but 
it does demonstrate a smaller but measurable positive 
impact on college performance of students successfully 
completing honors-type courses in high school. More 
important, this study demonstrates that the dynamic 
relationships of secondary school predictors and college 
grades are significantly different for honors-taking 
versus nonhonors-taking students. 

RR No. 84-1 Item No.: 275877 9 pgs 

1984 $15 

Grade Inflation and the Validity of the 
Scholastic Aptitude Test 

Isaac I. Bejar and Edwin O. Blew 

The purpose of this study was to clarify the issue of 
grade inflation by examining the database of the College 
Board’s Validity Study Service and to examine the effect 
of grade inflation on the validity of the SAT across a 
period of 15 years. Two types of analysis were performed. 
First, a longitudinal analysis of selected characteristics 
of SAT scores and GPA over a period of 15 years was 
conducted. The second type of analysis focused on a few 
selected schools with the hope of evaluating the effect, if 
any, of grade inflation on the validity of the SAT in those 
colleges. The study concludes that increases in grade 
point average at the collegiate level appear to be due to 
grade inflation and that the rate of grade inflation seems 
to have diminished since 1974. Because of the declining 


validity of the high school record, the SAT has become 
a more valuable tool for predicting academic success in 
college. 

RR No. 81-3 Item No.: 275853 16 pgs 

1981 $15 
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Student Characteristics and 
Preparation 

The Relationship of AP Teacher Practices and 
Student AP Exam Performance 

Pamela L. Paek, Henry Braun, Catherine Trapani, Eva 
Ponte, and Don Powers 

This report analyzes the relationship of Advanced 
Placement Program (AP) teacher practices and student 
performance on AP Biology and AP U.S. History Exams. 
Using a national survey of AP teachers, the study 
developed four models for each subject with public 
school teachers only and both public and nonpublic 
school teachers, using two standards of success (scoring 
3 or better and scoring 4 or better on the exams). 
Professional development and school and class context 
were statistically significant across all models; however, 
types of professional development differed. Resources 
were important for AP U.S. History teachers, while class 
size and schedule impacted AP Biology teachers. This 
indicates additional resources might enhance learning 
in AP U.S. History, while AP Biology teachers might be 
more effective with smaller, daily classes. 

RR No. 2007-5 Item No.: 070482345 49 pgs 

2007 $15 

The Impact of Course-Taking on Performance 
on SAT Items with Higher-Level Mathematics 
Content 

Hui Deng and Jennifer L. Kobrin 

This report summarizes the results of two studies 
designed to evaluate the impact of self-reported 
mathematics course-taking on performance on SAT 
mathematics questions measuring new content (Algebra 
II). Both studies analyzed data collected during the 
field trial of the new SAT. In study 1, standardized 
mean differences (effect sizes) were computed between 
students taking or planning to take certain mathematics 
courses and those not taking such courses to show the 
impact of course-taking on performance on old and 
new SAT mathematics questions. For both the old and 
new items, students who took a course scored higher 
than students who planned to take or didn’t take the 


course. Study 2 focused on the impact of taking or 
planning to take more advanced mathematics courses 
than Algebra II on old and new math item performance. 
It was observed that students who planned to take more 
advanced courses scored higher than students who did 
not plan to take any advanced courses on the old and the 
new content. 

RR No. 2006-8 Item No.: 060482015 13 pgs 

2006 $15 

Everyone Gains: Extracurricular Activities 
in High School and Higher SAT Scores 

Howard T. Everson and Roger E. Millsap 
This report presents evidence that links participation 
in extracurricular activities in high school with higher 
SAT scores. The analyses suggest that participation 
in extracurricular activities benefits minority and 
socioeconomically disadvantaged students as much as, 
or more than, economically advantaged white students. 
RR No. 2005-2 Item No.: 040481375 7 pgs 

2005 $15 

New SAT Writing Prompt Study: Analyses of 
Group Impact and Reliability 

Hunter M. Breland, Melvin Kubota, Kristine Nickerson, 
Catherine Trapani, and Michael Walker 
This study investigated the impact on ethnic, language, 
and gender groups of a new kind of essay prompt 
type intended for use with the new SAT. The study 
also generated estimates of the reliability of scores 
obtained using the prompts examined. To examine 
the impact of a new prompt type, random samples of 
llth-grade students in 49 participating high schools were 
administered writing tests using four different prompts, 
two of an old type and two of a new type. To obtain 
estimates of the reliability of scores for the old and new 
types of prompts, schools were asked to participate in a 
second round of testing to occur four months after the 
initial testing. Results of the impact analyses revealed 
no significant prompt type effects for ethnic, gender, 


or language groups, although there were significant 
differences in mean scores for ethnic and gender groups 
for all prompts. 

RR No. 2004-1 Item No.: 030481024 20 pgs 

2004 $15 

Examining the Relationship of Content to 
Gender-Based Performance Differences in 
Advanced Placement Exams 

Gary Buck, Irene Kostin, and Rick Morgan 

The purpose of this study is to examine the content 
of the questions in a number of Advanced Placement 
Examinations and to attempt to identify content that is 
related to gender-based performance differences. Free- 
response questions for 10 forms of the AP Exams in U.S. 
History, European History, Biology, Microeconomics, 
and Macroeconomics were studied, and the multiple- 
choice items for four forms of AP U.S. History were 
also studied. The study suggests that item content is 
associated with gender-based performance differences. 
RR No. 2002-12 Item No.: 040481188 34 pgs 

2002 $15 

Minority Student Success: The Role of Teachers 
in Advanced Placement Program* (AP) Courses 

Nancy W. Burton, Nancy Burgess Whitman, Mario 
Yepes-Baraya, Frederick Cline, and R. Myung-in Kim 
This report describes the characteristics and teaching 
behaviors of those successfully teaching AP Calculus 
AB and AP English Literature and Composition to 
underrepresented minority students. The purpose of the 
study is to assist educators in improving the participation 
and performance of underrepresented minority students 
in AP classes. Study results show that successful teachers 
of minority students are good teachers for all groups. 
They express a high opinion of students, both majority 
and minority, and hold them to high standards. They 
make sure that students understand and can apply the 
fundamental concepts in the discipline. They also help 
students and parents understand and feel comfortable 
about college. 

RR No. 2002-8 Item No.: 040481185 81 pgs 

2002 $15 
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Knowing What You Know and What You Don’t: 
Further Research on Metacognitive Knowledge 
Monitoring 

Howard T. Everson and Sigmund Tobias 
To date the authors have completed 23 studies of 
knowledge monitoring and its relationship to learning 
from instruction. Their earlier work, 12 studies in all, 
is summarized and reported elsewhere (see Tobias and 
Everson, 1996; Tobias and Everson, 2000). In this paper 
they continue this line of research and summarize the 
results of 11 studies that have been conducted over the 
past three years. The work reported here attempts to 
address a number of general issues, e.g., the domain 
specificity of knowledge monitoring, measurement 
concerns, and the relationship of knowledge monitoring 
to academic ability. 

RR No. 2002-3 Item No.: 993815 25 pgs 

2002 $15 

Measuring Educational Disadvantage of SAT 
Candidates 

Lawrence J. Strieker, Judith M. Pollack, Donald A. Rock, 
and Harold H. Wenglinsky 

This study explored individual differences in educational 
disadvantage — deficits in formal and informal education 
in the school, home, and elsewhere — in the SAT test- 
taking population. Factor analysis identified six 
educational disadvantage factors — four concerning the 
students’ schools and two the students’ nativity and 
parenting — and one family socioeconomic status factor, 
race/ethnicity, high school grades, and SAT scores. 
The individual-differences perspective on disadvantage 
appears to be a viable one, and educational disadvantage 
seems to be a meaningful and useful construct. 

RR No. 2002-1 Item No.: 993622 22 pgs 

2002 $15 

Swimming Against the Tide: The Poor in 
American Higher Education 

Patrick T. Terenzini, Elena M. Bernal, and 
Alberto F. Cabrera 

Despite an enormous investment in equalizing 
educational opportunities for all Americans, substantial 
evidence indicates that significant inequities remain, 


particularly for low-socioeconomic-status (SES) students. 
The report draws on an extensive review of the current 
research literature and contributes new analyses of 
national databases to fill in some of the holes in the 
existing literature. Among the findings are: (1) by the 
ninth grade, most students have developed occupational 
and educational expectations that are strongly related to 
SES; (2) parents’ knowledge of financial aid, financial 
planning for college, and students’ access to college and 
financial aid information are clearly associated with SES; 
and (3) nearly one-half of the lowest-SES-quartile high 
school graduates do not enroll the following fall in any 
postsecondary institution, a nonenrollment rate nearly 
five times higher than that of high-SES students. 

RR No. 2001-1 Item No.: 989828 52 pgs 

2001 $15 

Group Differences in Standardized Testing and 
Social Stratification 

Wayne J. Camara and Amy E. Schmidt 
Group differences among ethnic and racial groups on 
a series of educational measures and outcomes are 
examined. African American and Hispanic students 
perform substantially lower than white and Asian students 
on the SAT I. These substantial differences also exist on a 
variety of other admissions tests used for undergraduate, 
graduate, and professional programs. Similar differences 
are found on national testing programs such as NAEP 
and NELS, as well as on a variety of performance 
assessments. These results are consistent with differences 
in high school grades, the rigor and intensity of high 
school curriculum, college performance, and graduation 
among these groups. Differences in socioeconomic 
status (e.g., parental education and family income) are 
examined across these measures within and across 
ethnic and racial groups, and account for a large portion 
of the group differences found across these educational 
measures and outcomes. 

RR No. 99-5 Item No.: 275898 24 pgs 

1999 $15 


Effects of Coaching on SAT I: Reasoning Scores 

Donald E. Powers and Donald A. Rock 
A College Board-sponsored survey of a nationally 
representative sample of takers of the 1995-96 
SAT I: Reasoning Test yielded a database for more than 
4,000 examinees, about 500 of whom had attended 
formal coaching programs outside their schools. Several 
alternative analytical methods were used to estimate the 
effects of coaching on SAT scores. The various analyses 
produced somewhat different estimates. All of the 
estimates, however, suggested that the effects of coaching 
are far less than is claimed by major commercial test 
preparation companies. The revised SAT did not appear 
to be any more coachable than its predecessor. 

RR No. 98-6 Item No.: 040481184 17 pgs 

1998 $15 

Preparing for the SAT I: Reasoning Test — 

An Update 

Donald E. Powers 

To document the extent of special test preparation for 
the SAT I: Reasoning Test, a stratified random sample 
of some 6,700 students who registered to take the SAT 
in 1995-96 was surveyed. A smaller companion survey 
sought information about special preparation programs 
from a stratified random sample of secondary schools 
whose students take the SAT. The objectives were to: 
determine the availability, and incidence of use, of a 
variety of programs and resources designed to prepare 
students to take the SAT; describe some of the salient 
features of these resources; and estimate the amount 
of time (and money) that students spend on preparing 
for the test. Though the surveys differed slightly 
from similar surveys conducted in 1986-87, they were 
designed generally to enable comparison with the results 
of the earlier surveys. The student survey found that 
prospective SAT takers participate, to varying degrees, 
in a variety of preparation activities, and, on average, 
students spend approximately 11 hours preparing for 
the SAT. The results of the school survey revealed that a 
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slight majority (52 percent) of all secondary schools now 
offer programs to prepare students for the SAT, about the 
same proportion (49 percent) as in 1986-87. 

RR No. 98-5 Item No.: 217997 23 pgs 

1998 $15 

Knowledge Structures and Adult Intellectual 
Development 

Philip L. Ackerman in collaboration with Eric L. Rolfhus 

This report reviews a theoretically inspired empirical 
investigation of individual differences in knowledge, 
abilities, and nonability traits as part of an ongoing effort 
to better understand adult intellectual development and 
to develop more accurate measures of adult intelligence. 
Twenty Knowledge Scales were constructed, drawing 
on College-Level Examination Program (CLEP) and 
Advanced Placement Program (AP) Examinations. 
These Knowledge Structures were administered, along 
with an extensive battery of traditional ability tests, 
and measures of personality, interests, and self-concept, 
to two samples of adults, a “younger” adult group, age 
18-27, and an “older” adult group, age 30+. Results 
indicate that, in general, the older adult group showed a 
much higher degree of orientation toward “intellectual” 
aspects than the younger adult group, as indicated by 
scores on interest, personality, and self-concept scales. 

RR No. 98-3 Item No.: 200142 25 pgs 

1998 $15 

Inquiring About Examinees’ Ethnicity and 
Sex: Effects on Computerized Placement Tests 
Performance 

Lawrence J. Strieker and William C. Ward 
Laboratory experiments by Steele and Aronson (1995) 
found that African American subjects’ performance on 
difficult verbal items, described as verbal problem- solving 
tasks, was adversely affected when they were asked about 
their ethnicity just before working on the items. These 
results were attributed to “stereotype threat”: Asking 
about ethnicity primes African American subjects’ 
concerns about fulfilling the negative ethnic stereotype 
about their intellectual ability, thereby disrupting test 
performance. The present field experiment assessed the 
effects of asking community college students taking 


the Computerized Placement Tests (CPTs), in an actual 
operating setting, about their ethnicity and sex. This 
inquiry had no significant statistical and practical effects 
on how well the examinees did or on how long they 
worked on the tests. 

RR No. 98-2 Item No.: 040481182 10 pgs 

1998 $15 

Inquiring About Examinees’ Ethnicity and 
Sex: Effects on AP Calculus AB Examination 
Performance 

Lawrence J. Strieker 

Steele and Aronson (1995) found that the performance 
of African American subjects on test items portrayed 
as a problem-solving task, in a laboratory experiment, 
was adversely affected when they were asked about their 
ethnicity. This outcome was attributed to “stereotype 
threat”: Performance was disrupted by the subjects’ 
concerns about fulfilling the negative stereotype 
concerning African Americans’ intellectual ability. 
Extending that research, this field experiment evaluated 
the effects of inquiring about ethnicity and sex on 
the performance of examinees taking the Advanced 
Placement Program Calculus AB Examination in an 
actual test administration. With a minor exception, this 
inquiry had no statistically and practically significant 
effects on the test performance of African American, 
female, or other subgroups of examinees. 

RR No. 98-1 Item No.: 040481181 16 pgs 

1998 $15 

Correlates of Gender Differences in Cognitive 
Functioning 

Gita Z. Wilder 

This report offers a broad overview of the three major 
categories of explanations of gender patterns in cognitive 
functioning. Two of the major categories are biological 
and psychosocial. The third category, explanations that 
have been attributed to differences in the educational 
experiences of men and women, is treated separately 
because while such explanations are most appropriately 


considered a subset of psychosocial factors, they have 
special significance in the context of assessing cognitive 
ability. 

RR No. 96-03 Item No.: 251735 28 pgs 

1996 $15 

Assessing Metacognitive Knowledge Monitoring 

Sigmund Tobias and Howard T. Everson 

This report describes 12 studies dealing with the 
knowledge monitoring component of metacognition. It 
is assumed that knowledge monitoring is basic to other 
metacognitive activities, such as evaluating learning, 
selecting appropriate strategies, or planning, because 
distinguishing between what students know and do 
not know ought to be a prerequisite for these other 
higher-level activities. The 12 studies, 10 in the verbal 
domain and 2 in mathematics, used various versions 
of a knowledge monitoring assessment (KMA) that 
evaluates the discrepancy between students’ estimates 
of their knowledge and their demonstrated knowledge 
in a domain on a multiple-choice test. The results 
provide a good deal of support for the construct validity 
of the KMA and suggest that it has considerable 
generalizability over different types of content and 
varying student populations. Since the KMA may be 
group- or computer-administered and is objectively 
scored, it has substantial advantages over other means of 
evaluating metacognition. 

RR No. 96-01 Item No.: 200228 38 pgs 

1996 $15 

Analysis of the Revised Student Descriptive 
Questionnaire: Phase II Predictive Validity of 
Academic Self-Report 

Norman E. Freeberg, Donald A. Rock, and Judith Pollack 

An initial study phase examining the revised (1985) 
Student Descriptive Questionnaire (SDQ) assessed the 
accuracy of student self-report data on that instrument 
and found it to be of sufficient accuracy for its intended 
uses in admission and placement. This current phase 
of study examined the adequacy of the revised SDQ in 
terms of the predictive validity of its student academic 
self-report information against a criterion of first-year 
college achievement (FGPA). Findings indicated that 
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the validities are consistent with those of earlier studies 
using the original version of the SDQ, as well as with 
other similar self-report instruments used with college 
applicants, and can be appropriately used for admission. 
RR No. 89-8 Item No.: 217893 16 pgs 

1989 $15 

Sex Differences in Test Performance: A Survey 
of the Literature 

Gita Z. Wilder and Kristin Powell 

During the past several decades, extensive research has 
documented and attempted to explain and understand 
the differences between men and women on a wide 
range of education outcomes. Although educators and 
researchers have long been aware that such differences 
exist, public attention has only recently focused on 
the topic. This report, therefore, represents a timely 
and useful summary of significant research that has 
already been conducted and provides a context for 
future evaluation. More important, it discusses various 
hypotheses that have been advanced to explain observed 
differences and suggests interventions that might work 
toward eliminating such differences. 

RR No. 89-3 Item No.: 275974 50 pgs 

1989 $15 

Sex Differences in SAT Scores 

Nancy W. Burton, Charles Lewis, and Nancy Robertson 

This study explored the association of demographic 
differences between men and women, their effect on 
differences in the SAT scores, and whether changes 
in these demographic variables over time are related 
to SAT score trends. After adjusting for differences 
in background, women’s average SAT verbal scores 
were found to be higher than, or nearly equal to, 
men’s. Although women’s average SAT mathematical 
scores after adjustment were still lower than men’s, they 
were 25 points higher when adjusted for background. 
This report’s analysis established that the background 
differences between men and women were significantly 
related to verbal and mathematical score differences. 

RR No. 88-9 Item No.: 218112 23 pgs 

1988 $15 


Preparing for the SAT: A Survey of Programs 
and Resources 

Donald E. Powers 

To document the extent of special test preparation for 
the SAT, two separate surveys were conducted — one of a 
stratified random sample of 1986-87 SAT takers and the 
other of a stratified random sample of secondary schools 
whose students take the SAT. The objectives were to: 
(1) determine the availability, and incidence of use, of a 
variety of programs and resources designed to prepare 
students to take the SAT; (2) describe some of the salient 
features of these resources; (3) estimate the amount 
of time (and money) that students spend on these 
resources; and (4) obtain examinees’ reactions regarding 
the effectiveness of these resources. The results of these 
surveys revealed that nearly half of all secondary schools 
offer special programs of preparation for the SAT. These 
programs differ somewhat in their availability according 
to the geographic region, locale, and degree to which 
schools also provide various other kinds of courses. 
About 11 percent of all students in the survey said they 
had attended preparation or coaching sessions outside 
school. 

RR No. 88-7 Item No.: 218279 31 pgs 

1988 $15 

Analysis of the Revised Student Descriptive 
Questionnaire, Phase I Accuracy of Student- 
Reported Information 

Norman E. Freeberg 

As a self-report instrument, the Student Descriptive 
Questionnaire (SDQ) has, since 1971, enabled college 
applicants to describe a range of interests, activities, 
plans, and abilities in both academic and nonacademic 
areas. This study provides a preliminary examination 
of student accuracy of self-reported data on the revised 
SDQ. In this initial phase of the study, key items of 
student-reported information were shown to possess 
high levels of accuracy that indicated the suitability of 
the new form for its intended purposes, as well as its 
comparability with earlier versions of the SDQ and other 
student self-report questionnaires. 

RR No. 88-5 Item No.: 218276 25 pgs 

1988 $15 


Sex Differences in the Academic Performance 
of Scholastic Aptitude Test Takers 

Mary Jo Clark and Jerilee Grandy 

The number of female college students has increased 
dramatically over the past 15 years; in this same period, 
the average SAT scores for women have declined more 
than the scores for men. This study summarizes recent 
evidence concerning the academic performance of 
women and men by examining gender differences among: 
(1) all SAT takers; (2) test-takers grouped by anticipated 
major field of study; and (3) college freshman-year 
courses and grades. Consistent with recent literature 
on gender differences in cognitive performance, this 
study concludes that gender-related SAT differences 
are very small relative to the generally similar levels of 
performance by men and women, and that using both 
test scores and high school records to predict first-year 
college grades continues to work reasonably well for 
both sexes. 

RR No. 84-8 Item No.: 275884 27 pgs 

1984 $15 

A Profile of Preparation in Mathematics 

Gordon A. Hale and Beverly Whittington 
A self-evaluation instrument entitled the Mathematics 
Inventory was developed to generate a profile of 
students’ preparation in core areas of secondary school 
mathematics. This report discusses initial research 
using a draft version of the inventory. Appended to 
the report are the inventory along with prototypes of 
reports provided to students and to schools. Preliminary 
evidence suggests that the inventory responses may be 
at least moderately related to certain other indexes of 
mathematics proficiency; questionnaire results indicate 
that students see the inventory as useful. 

RR No. 84-6 Item No.: 275882 26 pgs 

1984 $15 

A Profile of Preparation in English: Phase II 

William C. Ward and Sybil B. Carlson 
This study attempted to develop and validate a 
method for collecting and reporting information about 
students’ preparation in English for college -level work. 
Information was gathered from students through a self- 
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report inventory, Experiences in English. As a result of 
this study, two prototype reports were developed. The 
first is a report to an individual student of his or her 
own preparation, while the second is a summary for an 
institution based on the responses of a group of students. 
This report discusses the development, potential use, 
and effectiveness of these prototype reports. 

RR No. 84-2 Item No.: 275878 17 pgs 

1984 $15 

Characteristics and Career Choices of 
Adolescent Girls 

Maureen Welsh, S.H.C.J. 

Career choices during adolescence may be related to 
personal characteristics such as values, interests, life 
goals, abilities, and self-image. The purpose of this study 
was to: (1) identify the personal characteristics of ninth- 
grade girls as well as their career choices during ninth 
grade; (2) isolate any personal characteristics of ninth- 
grade girls that were associated with their career choices 
and that distinguish them from girls with other career 
choices; and (3) detect any characteristics of their parents 
and of their schools that were associated with their 
career choices and that distinguish them from girls with 
other career choices. This study included 850 female 
students in mid-Atlantic schools that differed in size, 
control, location, ethnic composition, and percentage of 
graduates pursuing further education. The study found 
that female students need academic achievement, career 
exploration, and curriculum-related activities in school 
and in their community to attain their educational and 
career goals. 

RR No. 83-3 Item No.: 275871 15 pgs 

1983 $15 

Comparison of Male and Female Performance 
on the ATP Physics Test 

Patricia Wheeler and Abigail Harris 
This study examined a variety of student-level data 
that could possibly account for or help in interpreting 
the differences between males and females in overall 
performance on the Physics Achievement Test. Prior 
experience and success in Physics (e.g., number of 
semesters of physics or math), characteristics of students, 


and overall level of performance on the Physics Test 
related to performance on individual items or groups 
of items were examined to help interpret the overall 
performance differences between male and female 
students. Although no simple explanation for the 
performance discrepancy was found, the number of 
semesters of physics that a test candidate had completed 
proved to be an important variable. 

RR No. 81-4 Item No.: 275854 41 pgs 

1981 $15 
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Access and Retention in 
Higher Education 

Is Performance on the SAT Related to College 
Retention? 

Krista D. Mattern and Brian F. Patterson 

This study examines the relationship between scores 
on the SAT and retention to the second year of college 
using student level data from the freshman class of 2006 
at 106 four-year institutions. Results indicate that the 
SAT predicts second-year retention, with 95.5 percent 
of high performers returning but only 63.8 percent of 
low performers returning. While retention rates do vary 
by subgroups (i.e., gender, ethnicity, parental income, 
and highest parental education) and institutional 
characteristics (i.e., control, selectivity, size), these 
differences are moderated when SAT performance and 
other indicators of academic preparation are considered. 
RR No. 2009-7 Item No.: 09b-429 23 pgs 

2009 $15 

The Impact of Flagging on the Admission 
Process: Policies, Practices, and Implications 

Ellen B. Mandinach, Cara Cahalan, and 
Wayne J. Camara 

This study represents a first step in trying to gain a 
better appreciation for the complexity of the issues 
surrounding flagging test scores taken with nonstandard 
conditions and how the admissions process can better 
serve students with disabilities. Surveys were sent to 
admissions officers, guidance counselors, and disability 
service providers at colleges and universities to examine 
their institutional policies and practices. In addition, 
interviews and focus groups were conducted. It is clear 
from the results of this study that perceptions about the 
use of the flag for nonstandard test administrations differ 
based on the role the respondent plays in the admissions 
process. Although differences exist with regard to the 
use of the flag, all three groups perceived an equity 


problem concerning how students with disabilities are 
identified, what documentation is required, and what 
services are provided to these students. 

RR No. 2002-2 Item No.: 993907 57 pgs 

2002 $15 

Substituting SAT II: Subject Tests for SAT I: 
Reasoning Test: Impact on Admitted Class 
Composition and Quality 

Brent Bridgeman, Nancy W. Burton, and Frederick Cline 

Using data from a sample of 10 colleges at which most 
students had taken both SAT I: Reasoning Test and SAT 
II: Subject Tests, the authors simulated the effects of 
making selection decisions using SAT II scores in place 
of SAT I scores. Specifically, they treated the students 
in each college as though they comprised the applicant 
pool for a more selective college, and then selected the 
top two-thirds (and top one-third) of the students using 
high school grade point average combined with either 
SAT I scores or the average of SAT II scores. Success 
rates, in terms of freshman grade point averages (FGPA), 
were virtually identical for students selected by the 
different models. The percent of African American, 
Asian American, and white students selected varied 
only slightly across models. Appreciably more Mexican 
American and other Latino students were selected with 
the model that used SAT II scores in place of SAT I scores, 
because these students submitted Subject Test scores for 
the Spanish test on which they had high scores. 

RR No. 2001-3 Item No.: 991380 12 pgs 

2001 $15 

Improving the Odds: Factors that Increase 
the Likelihood of Four-Year College Attendance 
Among High School Seniors 

Jacqueline E. King 

The central purpose of this study was to identify 
factors that increase the likelihood that high school 
seniors will plan to attend a four-year college, paying 
particular attention to variables that are associated with 
college attendance by low-income students. Logistic 
regression was applied to data from three sources: a 
telephone interview of high school seniors who took 
the SAT I: Reasoning Test, a paper-and-pencil survey 


that students completed when they registered for the 
SAT, and the students’ combined SAT scores. This 
study evaluated how effectively eight factors, or sets 
of variables, predicted whether these seniors planned 
to attend a four-year college or university. In addition, 
this study found two previously untested variables to 
be particularly important predictors. The number of 
years students spent taking college-preparatory courses 
had a significant positive effect on the probability that 
they planned to attend a four-year college or university. 
The findings also suggested that counselors play a more 
important role than had previously been identified. 

RR No. 96-02 Item No.: 200230 34 pgs 

1996 $15 

Attitudes Toward Borrowing and Participation 
in Postsecondary Education 

Ruth B. Ekstrom 

High school seniors who are likely to borrow money 
when college costs substantially exceed what they, their 
family, and a scholarship can provide are significantly 
more likely to attend college than other students who 
would choose other options (delaying college entrance, 
attending a less expensive college, or getting a job). 
The policy shift from grants to loans as the major 
form of student financial aid had been blamed for 
the diminished participation of minority students in 
higher education. However, the analysis that examined 
variables associated with attitudes toward borrowing 
did not show any significant effect on attitude by race or 
ethnicity after variables such as knowledge about costs, 
educational aspirations, achievement, influence from 
others, and socioeconomic status were considered. 

RR No. 92-6 Item No.: 219300 12 pgs 

1992 $15 

An Evaluation of a Kit to Prepare 
Hispanic Students for the PSAT/NMSQT 

Maria Pennock-Romdn, Donald E. Powers, and 
Monte Perez 

A kit containing materials intended to familiarize 
Hispanic students with the PSAT/NMSQT was developed 
by the College Board, ETS, and the Hispanic Higher 
Education Coalition. This report provides some data on 
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the extent to which the kit’s objectives were achieved at 
some of the sites in which it has been used. Reactions 
to the kit were obtained from both staff and students. 
A number of these comments have value in planning 
the final revision and distribution of the kit. There is 
relatively strong evidence that programs involving the use 
of the test-familiarization kit can successfully encourage 
disadvantaged students to take the PSAT/NMSQT. 
Students’ comments suggested that not only their test- 
taking but also their problem-solving and language skills 
improved after use of the kit materials, which may lead to 
real improvement in their college preparedness. 

RR No. 89-1 Item No.: 239564 20 pgs 

1989 $15 

Handicapped Applicants to College: An Analysis 
of Admissions Decisions 

Warren W. Willingham 

Federal regulations protect the rights of handicapped 
students regarding admission testing and throughout the 
general admissions process. The purpose of this study 
was to compare admissions decisions of handicapped 
and nonhandicapped applicants who have comparable 
SAT scores and high school grades. The main finding 
was that handicapped applicants were admitted on 
much the same basis as nonhandicapped applicants, but 
there were exceptions that favored hearing-impaired 
applicants, disfavored small groups of visually impaired 
and physically handicapped applicants to small 
institutions, and disfavored learning disabled applicants 
to a lesser degree. 

RR No. 87-1 Item No.: 275899 18 pgs 

1987 $15 

State Policies for Admission to Higher Education 

Margaret E. Goertz and Linda M. Johnson 
The purpose of this study was to provide comprehensive 
information on statewide college admission standards. 
This report describes state policies regulating admission 
to colleges and universities and special admission 
policies affecting subgroup populations in the 50 states, 
and discusses trends in state admission policies. Nearly 
half of the states impose statewide minimum admission 
requirements on their public colleges and universities. 


Nine states have an open admission policy, while 13 
states require entering freshmen to meet a minimum 
test score or GPA, class rank, and/or other performance 
standard. Sixteen states enacted, or are proposing, more 
stringent admission policies. 

RR No. 85-1 Item No.: 275887 31 pgs 

1985 $15 

A Look at Part-Time Undergraduates: 

Enrollment Trends, Admission Requirements, 
and Characteristics of Those Taking the SAT 

Jerilee Grandy and Rosalea Courtney 
The number of undergraduates studying part-time in 
four-year colleges and universities has been steadily 
increasing over the past decade. The purpose of this 
project was to: (1) identify basic characteristics of SAT 
candidates planning to attend college part-time; (2) 
examine the trends in part-time enrollment in colleges 
requiring the SAT; (3) investigate the policies of those 
colleges regarding admission requirements for part- 
time students; and (4) determine those colleges’ level 
of concern about the validity of the SAT for part- 
time students. The findings indicate that the greatest 
proportional increases in part-time freshman enrollment 
were in highly selective institutions, i.e., those with 
average scores over 1200, and in the least selective 
colleges, i.e., those with scores under 700. Part-time 
candidates came from lower socioeconomic status 
families, on the average, and had a greater proportion of 
minorities. Part-time matriculated students beginning 
college just after high school were generally treated no 
differently from their full-time colleagues unless they 
were enrolled in a division specifically for part-time 
students. 

RR No. 84-4 Item No.: 275880 18 pgs 

1984 $15 

Access to College for Mexican Americans in the 
Southwest: Replication After 10 Years 

Rose M. Paydn, Richard E. Peterson, and Nancy A. 

Castille 

This article is based on a 1982 survey of Hispanic higher 
education enrollment and related practices and issues. 
Addressed to financial aid officers in five southwestern 


states, it was a replication of a similar study carried out in 
1972. In addition to presenting comparable results from 
the two surveys, the article reviews recent literature, 
comments on critical issues in Hispanic access, and 
outlines a number of implications from the study for 
expanding access to college for Hispanic youth. The 
most striking finding is that while Mexican American 
enrollment in higher education in the Southwest nearly 
doubled during the 1970s, during this study their 
numbers, as a percentage of total enrollment, increased by 
only 1 percent — from 10 to 11 percent. When compared 
to their proportion in the total population — 17 percent 
in 1972, 20 percent in 1983 — the increased magnitude of 
their underrepresentation becomes clear. 

RR No. 84-3 Item No.: 275879 30 pgs 

1984 $15 

College Student Attrition and Retention 

Leonard Ramist 

This study reviews research on college student attrition 
and retention examining overall dropout rates and the 
reasons students give for dropping out. Also examined 
are the demographic, academic, motivational, and 
personal characteristics of students who are likely 
to drop out and how general college environmental 
factors relate to persistence. College programs that 
would upgrade the level of educational service, thereby 
encouraging students to stay, are also examined. Based 
on a representative cross section of four-year colleges, 
the study found that 35-40 percent of entering freshmen 
graduate in four years from their college of original 
entry. The reasons students give for dropping out include 
academic matters, financial difficulties, motivational 
problems, personal considerations, dissatisfaction with 
college, military service, full-time jobs, the expressed 
need for new, practical, nonacademic experiences, and 
the lack of initial plans to obtain a degree. While some 
college environments are more conducive to persistence 
than others, most research has concluded that the fit 
between student and college is an important factor (e.g., 
a student from a small town is more likely to persist at a 
small college). 

RR No. 81-1 Item No.: 275851 37 pgs 

1981 $15 
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Research on Item Formats 
and Scoring 

Investigating the Effects of Increased 
SAT Reasoning Test Length and Time on 
Performance of Regular SAT Examinees 

Xiang Bo Wang 

This study investigates the effect of the increased test 
length due to the addition of the writing portion on the 
SAT Reasoning Test". Three data sets were analyzed in 
this research. The first data set was from the first SAT 
Reasoning Test administration in March 2005; the second 
data set came from the October 2005 administration; and 
the third data set came from the May 2002 administration. 
The report found no evidence that the current SAT 
test length has affected examinee performance at the 
population level or differentially across gender, racial/ 
ethnic, and best-language subgroups. 

RR No. 2006-9 Item No.: 060481980 42 pgs 

2006 $15 

The Effects of Essay Placement and Prompt Type 
on Performance on the New SAT 

Hyeon-Joo Oh and Michael E. Walker 
This study evaluated two items: (1) whether essay 
placement (either at the beginning or the end of the 
test battery) impacts test-takers’ performance on the 
critical reading, mathematics, and writing multiple 
choice measures; and (2) whether essay prompt type 
(either a simple one-line prompt or a prompt including a 
short passage) affects test-takers’ essay performance. The 
results indicate that essay placement only affects test- 
takers’ performance on the essay itself, not on the other 
measures. Those who took the essay first performed 
better on the essay section than those who took the 
essay last. The one-line prompt and the contextual 
prompt have a similar impact on the test-takers’ essay 
performance. 

RR No. 2006-7 Item No.: 060481999 16 pgs 

2006 $15 


Using DIF Dissection Method to Assess Effects 
of Item Deletion 

Yanling Zhang, Neil J. Dorans, and 
Joy L. Matthews-Lopez 

Statistical procedures for detecting differential item 
functioning (DIF) are often used as an initial step to 
screen items for construct irrelevant variance. This 
research applies a DIF dissection method and a two- 
way classification scheme to SAT Reasoning Test verbal 
section data and explores the effects of deleting sizable 
DIF items on reported scores after re-equating. Two 
hypotheses are studied: (1) whether or not the deletion 
of a sizable DIF item that is the most disadvantageous 
to a particular subgroup will affect the scores for that 
subgroup the most; and (2) whether or not the effects 
of item deletion on scores can be predicted by the 
standardization method. Both hypotheses are supported 
by the results of this research. 

RR No. 2005-10 Item No.: 050481690 11 pgs 

2005 $15 

Evaluating SAT II: Mathematics IC Items 
in the SAT I Population 

Jinghua Liu, Fred Schuppan, and Michael E. Walker 
This study explored whether the addition of the SAT II: 
Mathematics Level IC Test (Math IC) items with more 
advanced math content to the SAT test would impact 
test-taker performance. The findings support the notion 
that test-taker performance is not affected by the mere 
presence of Math IC items. Rather, the effects of these 
items appear to be linked directly to the difficulty level of 
the items. 

RR No. 2005-3 Item No.: 040481376 11 pgs 

2005 $15 


Developing a Portfolio Assessment: 

Pacesetter Spanish 

Andrea Fercsey and Carmen Luna 

Portfolios are one of the assessment tools used in 
Pacesetter Spanish. In this study an attempt was made to 
develop a standardized portfolio assessment system. As 
part of this system, a set of guidelines and an assessment 
matrix were prepared, piloted, and analyzed. 

RR No. 99-2 Item No.: 200272 76 pgs 

1999 $15 

Factors in Performance on Brief, Impromptu 
Essay Examinations 

Hunter M. Breland, Marilyn W. Bonner, and 
Melvin Y. Kubota 

Brief, impromptu essays written for the 1990 
administration of the College Board’s English 
Composition Test (ECT) were randomly sampled 
and subjected to further holistic ratings beyond those 
conducted for the ECT administration, and analytical 
ratings were also obtained. The holistic scores were 
correlated with the analytical scores to determine which 
essay characteristics were most closely associated with 
high holistic scores. The results indicated that overall 
organization, use of supporting materials, noteworthy 
ideas, rhetorical strategy, and thesis statement were the 
strongest correlates. Essays combining current affairs 
with literature and history or combining literature and 
history received slightly higher scores on average than 
essays based only on current affairs, literature, history, 
or personal experience. The analysis suggests that 
some practice with this type of brief, impromptu essay, 
particularly under strict time constraints, may be useful 
as preparation for taking such essay examinations. 

RR No. 95-4 Item No.: 200887 36 pgs 

1995 $15 
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Performance by Gender on an Unconventional 
Verbal Reasoning Task: Answering Reading 
Comprehension Questions Without the Passages 

Donald E. Powers 

The objective was to uncover any gender differences 
in approaches to and performance on a task requiring 
examinees to answer reading comprehension questions 
without reading the passages on which the questions 
were based. Data in this study were reanalyzed from a 
previously reported study of the passage dependence 
of reading comprehension questions being developed 
for the revised SAT. A few relatively small differences 
were detected between male and female test-takers. 
However, far more similarities than differences were 
noted with respect to both test performance and test- 
taking behavior. This would seem to suggest that males 
and females employ quite similar approaches to such 
reading comprehension tasks. 

RR No. 95-2 Item No.: 218096 8 pgs 

1995 $15 

Relationships Between Differential Performance 
on Multiple-Choice and Essay Sections 
of Selected AP Exams and Measures of 
Performance in High School and College 

Brent Bridgeman and Rick Morgan 

Some students appear to perform better on essay portions 
of Advanced Placement Program (AP) Examinations and 
less well on the multiple-choice portions, or vice versa. It 
is unclear whether students who are relatively strong on 
essays and weak on multiple-choice questions are more 
likely to succeed academically than students whose 
performance reflects the reverse pattern. Understanding 
these relationships may be useful not only for designing 
better assessment instruments but also for making 
optimal placement decisions. Thus a major purpose of 
the current study was to determine whether students 
with relatively high multiple-choice scores and low 
essay scores on AP Examinations were generally more 
successful in other testing situations and in college 
courses than students exhibiting the opposite pattern. 
The findings in this study are consistent with the 
conclusions of Bridgeman and Lewis (1994) indicating the 
roughly equal effectiveness of essay and multiple-choice 


tests in predicting course grades, and the superiority of 
multiple-choice scores for predicting success on other 
multiple-choice tests. 

RR No. 94-5 Item No.: 217851 10 pgs 

1994 $15 

Passage Dependence of the New SAT Reading 
Comprehension Questions 

Donald E. Powers and Susan T. Wilson 

It has been reasonably well established that test- 
takers can, to varying degrees, answer some reading 
comprehension questions correctly without reading the 
passages on which the questions are based. The new SAT 
places more emphasis on vocabulary within context (of 
reading passages). As a result, the use of reading scores, 
including those from the new SAT, has been challenged 
as a valid indicator of reading comprehension. The 
major aim of this study was to determine the strategies 
employed by examinees able to achieve better-than- 
chance performances without reading the passages. The 
research focused on a sample of reading comprehension 
questions similar to those that are used in the revised SAT, 
introduced in 1994. The results show that performance on 
the kinds of reading comprehension questions that make 
up the revised SAT does not appear to depend exclusively 
on information contained in the reading passages on 
which the questions are based. However, the importance 
of nonpassage factors appears to be relatively limited, 
especially in relation to the influence exerted by the 
reading passages. The desired interpretation of reading 
scores based on the new SAT reading comprehension 
questions does not seem unduly affected by examinees’ 
ability to benefit from information contained in the test 
questions themselves. 

RR No. 93-3 Item No.: 217849 18 pgs 

1993 $15 

Revising SAT-Verbal Items to Eliminate 
Differential Item Functioning 

W. Edward Curley and Alicia P. Schmitt 

Differential item functioning (DIF) statistics can be 
used to identify test questions on which the various 
focal (minority or female) and reference (white 
or male) populations perform differently. Since the 


mid-1980s, a series of DIF studies on the operational 
verbal sections of the SAT has been conducted to 
identify and assess the nature of the items on which 
DIF can be observed. Based on the initial SAT Verbal 
(SAT-V) pretest data and/or hypotheses advanced in the 
research literature, the authors selected seven sentence 
completions and 16 analogies with extreme levels of 
DIF and then systematically revised and readministered 
the items in an attempt to reduce or eliminate DIF. 
Several conclusions were drawn from the data analyzed 
in this investigation. First, revising and re-pretesting 
SAT-V items to eliminate DIF is feasible and likely to 
succeed often enough to make it practical to do so. 
Second, the particular terminology used in the stems 
and keys of analogies and sentence completions seems 
to be a significant source of elevated levels of DIF on the 
SAT-V. Third, to the extent possible, larger sample sizes 
for focal groups (particularly minority) would seem to be 
a desirable goal, since the stability of ETS DIF categories 
is reduced when the sample size is small. Fourth, for 
classifying the level of DIF (i.e., the “A,” “B,” and 
“C” categories), a combination of the Standardization 
p metric and the Mantel-Haenszel delta metric for very 
easy and very difficult items seems most effective. 

RR No. 93-2 Item No.: 217848 18 pgs 

1993 $15 

Sex-Related Performance Differences on 
Constructed-Response and Multiple-Choice 
Sections of Advanced Placement Examinations 

John Mazzeo, Alicia P. Schmitt, and Carole E. Bleistein 
A number of studies have indicated that the test 
performance of females relative to that of males was better 
on the multiple-choice items than on the constructed- 
response items. This report describes three exploratory 
studies of the performance of males and females on the 
multiple-choice and constructed-response sections of 
four Advanced Placement Program (AP) Examinations: 
United States History, Biology, Chemistry, and English 
Language and Composition. The studies were intended 
to evaluate some possible reasons for the apparent 
relationship between test format and the magnitude of 
gender-related differences in performance. The results 
suggest that the major factor accounting for the relatively 
better performance of females on constructed-response 
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tests may be construct-relevant. Constructed-response 
tests likely demand different sets of competencies than 
their multiple-choice counterparts, and gender-related 
differences in performance profiles across the two modes 
of assessment most likely reflect real disparities in the 
average level of achievement obtained by males and 
females with respect to these different competencies. 

RR No. 92-7 Item No.: 217822 29 pgs 

1992 $15 

An Analysis of English Composition Test Essay 
Prompts for Differential Difficulty 

Mark Pomplun, David Wright, Napoleon Oleka, and 
Marilyn Sudlow 

The purpose of this study was to conduct a detailed 
analysis of the difficulty over time of the essay 
prompts for the College Board’s English Composition 
Achievement Test (ECT) with Essay. Differential 
difficulty was explored by considering whether the 
relationship between the reference groups — male and 
white students — and the focal groups — female, American 
Indian, Asian American, Asian American ESL (English 
as a Second Language), black, Hispanic, and Hispanic 
ESL students — had remained constant over the seven 
years studied. Only two ECT essays examined showed 
signs of differential performance of groups that were 
associated with specific essay features. In one essay, 
the topic of heroes and values may have favored groups 
more familiar with cultural values. In the other essay, 
the combination of an abstract topic with an ironic tone 
may have caused differential performance for those with 
lower language skills. 

RR No. 92-4 Item No.: 215445 45 pgs 

1992 $15 

A Study of Gender and Performance on 
Advanced Placement History Examinations 

Hunter M. Breland, with Despina O. Danos, Helen D. 
Kahn, Melvin Y. Kubota, and Marilyn W. Sudlow 
Several studies have shown that, on average, women 
perform slightly better than men on constructed- 
response tests, while men perform slightly better on 
multiple-choice tests. Studies of the Advanced Placement 
Program (AP) Examinations have revealed a similar 


phenomenon. For almost all AP Examinations, men 
average better on both parts of the tests, but gender 
differences on the free-response parts are almost always 
less, and for some tests they are nonsignificant. Two AP 
Examinations, U.S. History and European History, were 
selected for study because gender differences on the free- 
response portions of the test were nonsignificant while 
gender differences on the multiple-choice parts were 
large. Random samples of free-response booklets were 
drawn from the 1986 administrations of both exams. 
Ratings and analysis were made of the responses: English 
composition quality, historical content, responsiveness, 
factual errors, handwriting quality, neatness, and 
number of words written. All variables were then used 
to predict the free-response scores. Several significant 
predictors were observed: the AP multiple-choice score, 
historical content, English composition quality, and 
the number of words written. The study suggests that 
formal effects are real and cannot be attributed to bias in 
scoring or to totally irrelevant variables. When scoring 
was conducted analytically with a focus on historical 
content, no gender differences were observed in the free- 
response portions. This is the same result observed from 
the regular administration readings, which are graded 
holistically and by readers different from those used for 
this study. 

RR No. 91-4 Item No.: 218211 38 pgs 

1991 $15 

Comparative Validity of Multiple-Choice and 
Free-Response Items on the Advanced Placement 
Examination in Biology 

Brent Bridgeman 

The Advanced Placement Program (AP) reports grades 
to students and colleges on a l-to-5-point scale derived 
by combining the separate scores on the multiple-choice 
and free-response sections of AP Examinations. This 
study investigated the effectiveness of the current scoring 
practices of reporting AP grades that are based on this 
combined grade. Correlations are generally higher and 
more comparable across gender with such composite 
grades than would be the case if only essays were used. 
The results of this study support the current practice of 


using both multiple-choice and essay scores to compute 
the l-to-5 AP grade, predictions of college course grades 
would be substantially less accurate. 

RR No. 89-2 Item No.: 273708 16 pgs 

1989 $15 

The Equivalence of Scores from Automated and 
Conventional Educational and Psychological 
Tests: A Review of the Literature 

John Mazzeo and Anne L. Harvey 

A literature review was conducted to determine the 
current state of knowledge concerning the effects of 
computer administration of standardized educational 
and psychological tests on the psychometric properties 
of these instruments. Studies were grouped according 
to a number of factors relevant to the administration of 
tests by computer. It was found that: (1) the rate at which 
test-takers omit items in an automated test may differ 
from the rate at which they omit items on a paper-and- 
pencil test; (2) scores on tests from automated versions of 
personality inventories are lower than scores obtained in 
the paper-and-pencil format; (3) scores from automated 
versions of speed tests are not likely to be comparable 
with scores from paper-and-pencil versions; (4) the 
presentation of graphics in an automated test may have 
an effect on score equivalence; (5) tests containing items 
based on reading passages can become more difficult 
when presented on a CRT; and (6) the possibility of 
such asymmetric practice effects may make it wise to 
avoid conducting equating studies based on single-group 
counterbalanced designs. 

RR No. 88-8 Item No.: 218111 27 pgs 

1988 $15 
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Differential Item Functioning for Males and 
Females on SAT-Verbal Reading Subscore Items 

Ida M. Lawrence, W. Edward Curley, and 
Frederick J. McHale 

The reading comprehension and sentence completion 
items from four forms of the SAT Verbal sections 
were examined for differential item functioning (DIF) 
between male and female test-takers. An important 
factor that appeared to be connected to DIF on reading 
comprehension items was the extent of technical 
information contained in reading passage material. 
Items associated with passages containing technical (as 
opposed to historical or philosophical) science material 
were generally more difficult for female examinees. 
The main factor that appeared to be related to DIF on 
sentence completion items was the distinction between 
nonscience or surface science references and true science 
references. Items containing true science references 
tended to be more difficult for females. 

RR No. 88-4 Item No.: 218277 55 pgs 

1988 $15 

Remote Scoring of Essays 

Hunter M. Breland and Robert J. Jones 
Essays written by college freshmen on two different topics 
were scored first by readers working in a conference 
setting and second by another set of readers working in 
their own homes or offices. The conference readers were 
trained in the standard manner on the specific topics 
to be scored and were monitored by table leaders, as is 
done in standard scoring procedures. The remote readers 
received only written instructions in the mail, and there 
was no monitoring of their scoring. The study compares 
the efficiency and accuracy of both scoring methods. 
Results suggested that calibrated remote scores offer 
promise but that they cannot be considered equivalent to 
conference scores in terms of either reliability or validity. 
On the other hand, score discrepancies — and thus the 
need for adjudication — can be substantially decreased 
through calibration. 

RR No. 88-3 Item No.: 217762 39 pgs 

1988 $15 


Three Studies of SAT-Verbal Item Types 

William B. Schrader 

A recent finding that the reading subscores on the verbal 
sections of the SAT have substantially higher validity 
than the vocabulary subscores has stimulated interest in 
the four item types on the SAT Verbal section. This report 
provides a summary of data from item and test analysis 
on: (1) the difficulty level and patterns of nonresponse 
for the four item types, (2) the extent to which each 
item type supplies items having both a relatively high 
difficulty level and a reasonably high biserial correlation, 
and (3) the true- score intercorrelations of the four item 
types. 

RR No. 84-7 Item No.: 275883 45 pgs 

1984 $15 

The Direct Assessment of Writing Skill: A 
Measurement Review 

Hunter M. Breland 

Direct assessment of writing skill, usually considered 
to be synonymous with assessment by means of writing 
samples, was reviewed in terms of its history and with 
respect to existing evidence of its reliability and validity. 
Reliability was examined as it is influenced by reader 
inconsistency, domain sampling, and other sources of 
error. Evidence of validity is provided by relationships 
between direct assessment scores and criteria such as class 
rank, English course grades, and instructors’ ratings of 
writing ability. Direct assessment of writing also exhibits 
incremental validity over and above other available 
measures. It was concluded that direct assessment makes 
a contribution but that methods need to be developed to 
improve its reliability and reduce its costs. 

RR No. 83-6 Item No.: 275874 23 pgs 

1983 $15 


Perceptions of Writing Skill 

Hunter M. Breland and Robert J. Jones 
A random sample of 806 essays was taken from over 
80,000 essays written for the College Board’s English 
Composition Achievement Test (ECT) during 
December 1979. Using a special taxonomy of 20 
writing characteristics, these essays were subjected to a 
second special reading to determine which of these 20 
characteristics most influenced judgments of writing 
quality. The results showed that certain characteristics 
of discourse, including organization, transition, use 
of supporting evidence, and the originality of ideas 
presented, influenced judgments the most. 

RR No. 82-4 Item No.: 275864 68 pgs 

1982 $15 
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Psychometric Research on 
College Board Tests 

An Investigation of Scale Drift for Arithmetic 
Assessment of ACCUPLACER® 

Hui Deng and Gerald Melican 

The current study was designed to extend the 
current literature to study scale drift in CAT as part 
of improving quality control and calibration process 
for ACCUPLACER®, a battery of large-scale adaptive 
placement tests. The study aims to evaluate item 
parameter drift using empirical data that span four years 
from the ACCUPLACER Arithmetic assessment. The 
results suggest that the Arithmetic test maintained a 
reasonably stable scale in the years 2004 through 2007. 
RR No. 2010-2 Item No.: 10b-1418 7 pgs 

2010 $15 

Validating Cognitive Models of Task 
Performance in Algebra on the SAT 

Mark Gierl, Jacqueline Leighton, Changjiang Wang, 
Jiawen Zhou, Rebecca Gokiert, and Adele Tan 
The purpose of the study is to present research focused 
on validating the four algebra cognitive models in Gierl, 
Wang, et al., using student response data collected with 
protocol analysis methods to evaluate the knowledge 
structures and processing skills used by a sample of 
SAT takers. 

RR No. 2009-3 Item No.: 090482922 35 pgs 

2009 $15 

Differential Validity and Prediction of the SAT 

Krista D. Mattern, Brian F. Patterson, Emily J. Shaw, 
Jennifer L. Kobrin, and Sandra M. Barbuti 
The purpose of the study is to examine the differential 
validity and prediction of the SAT using a nationally 
representative sample of first-year college students 
admitted with the revised version of the SAT. The 
findings demonstrate that there are similar patterns 
of differential validity and prediction by gender, race/ 


ethnicity, and best language subgroups on the revised 
SAT compared with previous research on older versions 
of the test. 

RR No. 2008-4 Item No.: 080482567 12 pgs 

2008 $15 

Time Requirements for the Different Item Types 
Proposed for Use in the Revised SAT 

Brent Bridgeman, Cara Cahalan Laitusis, and Frederick 
Cline 

The current study used three data sources to estimate 
time requirements for different item types on the now 
current SAT Reasoning Test. First, we estimated times 
from a computer-adaptive version of the SAT (SAT CAT) 
that automatically recorded item times. Second, we 
observed students as they answered SAT questions under 
strict time limits and recorded the amount of time taken 
for each question. Finally, we asked high school students 
to record the amount of time taken for test subsections 
that were composed of items of a single type. The rules 
of thumb used by test developers were quite accurate in 
rank ordering the item types from least to most time 
consuming, but the time actually spent was generally 
higher than assumed in the rules of thumb. 

RR No. 2007-3 Item No.: 07-1941 21 pgs 

2007 $15 

Monitoring Reader Performance and DRIFT 
in the AP English Literature and Composition 
Examination Using Benchmark Essays 

Edward W. Wolfe, Carol M. Myford, George Engelhard Jr., 
and Jonathan R. Manalo 

In this study, we investigated a variety of Reader effects 
that may influence the validity of ratings assigned to AP 
English Literature and Composition essays. Specifically, 
we investigated whether Readers exhibit changes in 
their levels of severity and accuracy, and their use of 
individual scale categories over time. We refer to changes 
in these characteristics of Readers as Differential Reader 
Functioning over Time (DRIFT). Our literature review 


points out several weaknesses in the way Reader effects 
have been addressed in prior studies, and the study 
sought to address several of those weaknesses. 

RR No. 2007-2 Item No.: 070482285 39 pgs 

2007 $15 

Examination of Fatigue Effects from Extended- 
Time Accommodations on the SAT Reasoning 
Test 

Cara Cahalan Laitusis, Deanna L. Morgan, Brent 
Bridgeman, Jennifer Zanna, and Elizabeth Stone 
This study examined operational data from the SAT 
Reasoning Test to determine if students who tested under 
extended-time conditions were suffering from excessive 
fatigue relative to students who tested under standard- 
time conditions. Excessive fatigue was defined by 
significant (a) increases in differential item functioning 
(DIF) and (b) decreases in item completion rates, for items 
at the end of testing compared to the beginning of testing. 
Both of these factors were examined by comparing the 
performance of students who tested with extended time 
on items administered early (section position 2 of 3) and 
different items administered late (section position 8, 9, 
or 10) during the 10-section test administration. The 
sample included students with learning disabilities and / 
or Attention-Deficit/Hyperactivity Disorder (ADHD) 
who tested with extended time (time and a half or double 
time) and students without disabilities who tested under 
standard-time conditions. Analyses were conducted on 
the critical reading and writing sections of the SAT and 
examined item difficulty as well as item completion 
rates. Results indicated few changes in levels of DIF 
(early in the test compared to late in the test). In addition, 
item completion rates for students who received extra 
time were comparable to (or in some cases higher than) 
test-takers without disabilites who tested under standard 
time on both early and late sections. 

RR No. 2007-1 Item No.: 07-0873 13 pgs 

2007 $15 
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The Relationship Between PSAT/NMSQT Scores 
and AP Examination Grades: A Follow-Up Study 

Maureen Ewing, Wayne J. Camara, and Roger E. Millsap 
The purpose of this study is to reexamine the relationship 
between PSAT/NMSQT scores and Advanced Placement 
(AP) Examination grades, previously studied by Camara 
and Millsap (1998), but using more recent test data 
in order to obtain additional validation evidence for 
using the PSAT/NMSQT to identify AP students. The 
results show that one or more PSAT/NMSQT scores 
were moderately to strongly correlated to grades on all 
AP Examinations with the exception of four exams. 
The exceptions were: (1) German Language, (2) Spanish 
Language, (3) Studio Art, and (4) Studio Art: 2-D Design. 
Camara and Millsap (1998) found similar results. 

RR No. 2006-1 Item No.: 050481750 28 pgs 

2006 

Identifying Content and Cognitive Dimensions 
on the SAT 

Mark J. Gierl, Xuan Tan, and Changjiang Wang 

Researchers used both statistically based dimensionality 
analyses and content-based substantive analyses to 
identify and interpret the cognitive dimensions measured 
on the mathematics and critical reading sections of the 
SAT. 

RR No. 2005-11 Item No.: 050481691 31 pgs 

2005 $15 

The Impact of Extended Time on 
SAT Test Performance 

Ellen B. Mandinach, Brent Bridgeman, 

Cara Cahalan-Laitusis, and Catherine Trapani 
This study explored the impact of providing standard time, 
time and a half with and without specified section breaks, 
and double time without specified section breaks on the 
verbal and mathematics sections of the SAT. Differences 
among ability, disability, and gender groups were examined. 
Results indicated that time and a half with separately timed 
sections benefits students with and without disabilities. 
Some extra time improves performance, but too much 


may be detrimental. Extra time benefits medium- and 
high-ability students but provides little or no advantage to 
low- ability students. 

RR No. 2005-8 Item No.: 050481688 35 pgs 

2005 $15 

Invariance of Linkings of the Revised 2005 SAT 
Reasoning Test to the SAT I: Reasoning Test 
Across Gender Groups 

Jinghua Liu, Miriam Feigenbaum, and Neil J. Dorans 

Score equity assessment was used to evaluate linkings 
of the new SAT to the SAT I: Reasoning Test. The 
results indicated that the conversion lines obtained 
through subgroup-only linkings were very similar to 
those obtained using the total group linking for both 
critical reading and math prototypes. Hence, on the basis 
of field trial data, it appears that population invariance 
was achieved with respect to gender groups. 

RR No. 2005-6 Item No.: 050481413 14 pgs 

2005 $15 

A Study of Fatigue Effects from the New SAT 

Jinghua Liu, Jill R. Allspach, Miriam Feigenbaum, 
Hyeon-Joo Oh, and Nancy Burton 

This study evaluated whether the addition of a writing 
section to the SAT would impact test-taker performance 
because of fatigue caused by increased test length. The 
results indicated that while the extended testing time 
for the new SAT may cause test-takers to feel fatigued, 
fatigue did not affect test-taker performance. 

RR No. 2004-5 Item No.: 040481305 13 pgs 

2004 $15 


Beyond Individual Differences: Exploring School 
Effects on SAT Scores 

Howard T. Everson and Roger E. Millsap 
This report explores the complex, hierarchical 
relationship among school characteristics, individual 
differences in academic achievement, extracurricular 
activities, and socioeconomic background on 
performance on the SAT. Analyses suggest that 
multilevel structural equating models provide a 
reasonably good fit to the data, that family background 
influences SAT scores directly and indirectly that 
learning opportunities in and outside of the school 
curriculum are related to SAT performance, and that 
the characteristics of the schools matter when it comes 
to performance on the SAT. 

RR No. 2004-3 Item No.: 040481303 18 pgs 

2004 $15 

A Simulation Study to Explore Configuring 
the New SAT Critical Reading Section Without 
Analogy Items 

Jinghua Liu, Miriam Feigenbaum, and Linda Cook 

This study explored possible configurations of the new 
SAT I: Critical Reading Test without analogy items. 
The item pool contained items from 14 previously 
administered SAT Verbal tests, calibrated using the 
three-parameter logistic IRT model. Multiple versions 
of several prototypes that do not contain analogy items 
were assembled. Item statistics and test statistics for 
the simulated forms were compared to the average of 
13 forms of the SAT. These statistics included: IRT 
scaled score reliability scaled score standard error of 
measurement, conditional scaled score standard error of 
measurement, r-biserial correlations, and equated deltas. 
The results indicated that it is possible to maintain 
measurement precision for the new SAT critical reading 
section without analogy items, but it may be necessary 
to modify the distribution of item difficulty in order to 
obtain adequate precision at the ends of the score scale. 
RR No. 2004-2 Item No.: 030481025 12 pgs 

2004 $15 


22 


Research Reports: Psychometric Research on College Board Tests 


Linking Scores from Tests of Similar Content 
Given in Different Languages: Spanish Language 
PA A" and English Language SAT I 

Alicia S. Cascallar and Neil J. Dorans 
Score linkages between the Verbal and Math sections of 
the SAT I: Reasoning Test and the corresponding sections 
of the new version of a SpanishTanguage admissions 
test, the Prueba de Aptitud Academica (PAA'“), were 
investigated. A bilingual group design was employed. 
A language proficiency measure (ESLAT) was used to 
define the bilingual group and as a predictor variable. 
Prediction and scaling for concordance results were 
compared. Results indicated that for both single (PAA 
Verbal or PAA Math to the corresponding SAT I scores) 
and composite (PAA-V+M to SAT I-V+M and PAA- 
V+M+ESLAT to SAT I-V+M) score linkage, prediction 
is preferable to concordance. Comparison of prediction 
and concordance results for composite scores versus 
single construct scores indicates that when PAA Verbal 
is combined with PAA Math to form a composite, 
predictions of this composite are better than for Verbal 
alone but worse than predictions for Math alone. 

RR No. 2003-5 Item No.: 998644 11 pgs 

2003 $15 

A Historical Perspective on the Content of the 
SAT 

Ida M. Lawrence, Gretchen W. Rigol, Thomas Van Essen, 
and Carol A. Jackson 

This paper provides a historical perspective on the content 
of the SAT. The review begins at the beginning, when the 
first College Board SAT (the “Scholastic Aptitude Test”) 
was administered to 8,040 students on June 23, 1926. At 
that time, the SAT consisted of nine subtests: definitions, 
arithmetical problems, classification, artificial language, 
antonyms, number series, analogies, logical inference, 
and paragraph reading. Over the years, the SAT has 
evolved in the way it measures what is now referred 
to as verbal and mathematical “reasoning.” With each 
redesign of the SAT, a variety of considerations were 
taken into account, including fairness issues, scaling 
issues, cost, public perception, face validity, changes 
in the test-taking population, changes in patterns of 
test preparation, and changes in the college admissions 


process. This paper describes the reasons for the various 
changes while emphasizing that the value of SAT scores 
rests on the test’s high technical quality, and on the 
assumption that scores would maintain their meaning 
over time. 

RR No. 2003-3 Item No.: 997274 19 pgs 

2003 $15 

Monitoring Faculty Consultant Performance 
in the Advanced Placement English Literature 
and Composition Program with a Many-Faceted 
Rasch Model 

George Engelhard, Jr. and Carol M. Myford 
The purpose of this study was to examine, describe, 
evaluate, and compare the rating behavior of faculty 
consultants who scored essays written for the Advanced 
Placement English Literature and Composition (AP 
ELC) Exam. Data from the 1999 AP ELC Exam were 
analyzed using FACETS (Linacre, 1998) and SAS. The 
faculty consultants were not all interchangeable in terms 
of the level of severity they exercised. If students’ ratings 
had been adjusted for severity differences, the AP grades 
of about 30 percent of the students would have been 
different from the one they received. Almost all the 
differences were one grade or less. Adjusting ratings for 
faculty consultant severity differences would impact 
some student subgroups more than others. 

RR No. 2003-1 Item No.: 995947 60 pgs 

2003 $15 

The Recentering of SAT Scales and Its Effects on 
Score Distributions and Score Interpretations 

Neil J. Dorans 

This report summarizes the history of SAT score scales, 
outlines the need for realigning SAT score scales, and 
explains how scores were converted from original 
SAT scales to recentered scales. Issues associated with 
converting recentering from a possibility into a reality 
are discussed. 

RR No. 2002-11 Item No.: 040481187 21 pgs 

2002 $15 


An Investigation of the Validity of AP Grades of 
3 and a Comparison of AP and Non-AP Student 
Groups 

Barbara G. Dodd, Steven J. Fitzpatrick, R. J. De Ayala, 
and Judith A. Jennings 

The purpose of this study was to address the validity 
of grades of 3 on AP Examinations and to compare 
AP students to other relevant student groups. While 
research has shown that students who earn grades 
of 3 or higher and place out of introductory courses 
do well in the subsequent courses, there are some 
college faculty members who think this is not always 
the case. To address this issue, a number of different 
statistical techniques were employed to determine if 
finer gradations of the grade group of 3s might prove 
useful for course placement in college. The findings of 
this study did not support finer gradations of the AP 
score category of 3. It was also found that AP students 
who earn credit by examination tend to make the same 
or higher grades in subsequent courses than do the other 
comparison groups. 

RR No. 2002-9 Item No.: 995384 57 pgs 

2002 $15 

The Utility of the SAT I and SAT II for 
Admissions Decisions in California and 
the Nation 

Wayne J. Camara, Glenn B. Milewski, and 
Jennifer L. Kobrin 

This study examines the relative utility and predictive 
validity of the SAT I and SAT II for various subgroups in 
both California and the nation. The effect of eliminating 
the SAT I on the test impact and on the over- and 
underprediction of various gender and racial/ethnic 
subgroups is examined. 

RR No. 2002-6 Item No.: 994217 28 pgs 

2002 $15 
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The Performance Assessment Study in Writing: 
Analysis of the SAT II: Writing Test 

Hunter M. Breland, Melvin Y. Kubota, Marilyn Bonner 
This study examined the SAT II: Writing Test as a 
predictor of writing performance in college English 
courses. Special attention was given to comparisons of 
the predictive effectiveness of the essay and multiple- 
choice components of the test. It was concluded that 
both components were good predictors; however, the 
longer 40-minute multiple-choice component tended 
to produce higher predictive correlations than the 20- 
minute essay component. The best predictions were 
obtained when the two components were combined. 

RR No. 99-4 Item No.: 275900 24 pgs 

1999 $15 

Correspondences Between ACT and SAT I Scores 

Neil J. Dorans 

Correspondences between ACT and SAT I scores 
are presented from a conceptual framework that 
distinguishes among three kinds of correspondences: 
equating, scaling, and prediction. Relationships among 
the different scales of the ACT and SAT I are described 
in the context of the conceptual framework. Sums of 
scores, composites of scores, and individual scores are 
examined. 

RR No. 99-1 Item No.: 200273 24 pgs 

1999 $15 

Using the PSAT/NMSQT and Course Grades in 
Predicting Success in the Advanced Placement 
Program 

Wayne J. Camara and Roger Millsap 
This study reports that student performance on the 
PSAT/NMSQT can be useful in identifying additional 
students who may be successful in Advanced Placement 
Program (AP) courses. PSAT/NMSQT scores can identify 
students who may not have been initially considered 
for an AP course through teacher nomination, self- 
nomination, or other local procedures. Performance on 
the PSAT/NMSQT is not strongly related to AP grades on 
only four examinations: (1) studio art: design, (2) studio 
art: drawing, (3) German language, and (4) Spanish 
language. The relationship of PSAT/NMSQT scores with 


other AP Examination grades is moderately strong and 
invariant across ethnic groups and time of testing. That 
is, the relationship is substantially the same for all ethnic 
and racial groups and is only slightly weaker when time 
between testing spreads across two academic years. 

RR No. 98-4 Item No.: 040481183 20 pgs 

1998 $15 

Methods Used to Establish Score Comparability 
on the Enhanced ACT Assessment and the SAT 

Gary L. Marco, A. A. Abdel-Fattah, and 
Patricia A. Baron 

Marco and Abdel-Fattah (1991) reported newly established 
relationships between scores on the enhanced American 
College Testing Program (ACT) Assessment and scores 
on the SAT. Fourteen large universities provided data 
on applicants who had taken both the enhanced ACT 
Assessment and the SAT. The report provides a detailed 
description of the methodology used to develop the 
“concordance” tables reported in the 1991 study, as well 
as the methods used to establish comparability between 
scores on the ACT Composite from the enhanced 
ACT Assessment and scores on the SAT-V and SAT-M 
composite (SAT-V + M). The results should aid test users 
in attempting to compare the performance of students 
taking these different tests. 

RR No. 92-3 Item No.: 215444 19 pgs 

1992 $15 

Sex Differences in Problem-Solving Strategies 
Used by High-Scoring Examinees on the SAT-M 

Ann M. Gallagher 

Gender differences in mathematical performance are 
well documented, although the hypothesized causes 
of these differences are varied. The research presented 
here seeks to add to our understanding of the nature 
of gender differences in performance on standardized 
mathematics tests. An item classification scheme 
developed by Gallagher (1990) was refined, resulting 
in a more accurate prediction of gender differences 
in performance on the mathematics test. Structured 
interviews were conducted with students (25 male and 
22 female) in this score range to determine the nature 
of differences in strategy use. Findings described in this 


report offer direct support for the notion that at least 
a portion of the differences among high scorers can be 
attributed to differences in strategy use. Females in this 
group appeared to depend more heavily than males on 
standard algorithmic strategies that are generally taught 
in the classroom; males were more apt to use insight in 
their solutions. Both male and female students who used 
more algorithmic strategies tended to rate mathematics 
as more difficult and less relevant to their lives. 

RR No. 92-2 Item No.: 215443 35 pgs 

1992 $15 

Comparability of Computer and Paper- 
and-Pencil Scores for Two CLEP* General 
Examinations 

John Mazzeo, Barry Druesne, Paul C. Raffeld, 

Keith T. Checketts, and Alan Muhlstein 
This report describes two studies that investigated the 
comparability of scores from pencil-and-paper and 
computer-administered versions of the College-Level 
Examination Program (CLEP) General Examinations 
in Mathematics and English Composition. The first 
study used a prototype computer-administered version 
of each examination. Based on the results of the first 
study and feedback from the study participants, several 
modifications were made to these prototype versions. 
A second study was then conducted using the modified 
computer versions. Both studies used a single-group 
counterbalanced equating design. The results of Study 1 
suggest that, despite efforts to design computer versions 
of the CLEP Mathematics and English Composition 
General Examinations that were administratively 
similar to the paper-and-pencil examinations, mode-of- 
administration effects were found. The results of Study 
2 suggest that the modifications made to the computer 
versions eliminated the mode-of-administration effects 
for the English Composition Examination but not for the 
Mathematics Examination. The results of both studies 
underscore the need to determine empirically (rather 
than to just assume) the equivalence of computer and 
paper versions of an examination. 

RR No. 91-5 Item No.: 218205 18 pgs 

1991 $15 
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Cohort Differences Associated with Trends 
in SAT Score Averages 

Rick Morgan 

Throughout the 1970s, average SAT scores declined. 
However, since 1980 the SAT-M and the SAT-V rose 
until 1986 when the SAT-V started a steady decline. The 
potential impact of cohort changes with regard to ethnic 
group, gender, class rank, and first language learned 
on yearly average SAT scores and Test of Standard 
Written English (TSWE) scores were studied. Regression 
and cross-classification analysis were conducted on 
data from 1985 and 1987 to 1990. The results suggest 
that approximately half of the decline in SAT verbal 
scores was associated with cohort change. Average SAT 
mathematics scores could possibly have risen by three 
points, rather than remaining constant, if the 1987 and 

1990 cohorts were the same. 

RR No. 91-1 Item No.: 217801 26 pgs 

1991 $15 

Sex Differences in the Performance of High- 
Scoring Examinees on the SAT-M 

Ann M. Gallagher 

Performance of high-scoring males and females on 
the mathematics section of three forms of the 
SAT-M was examined to determine how item content, 
solution strategy, and speededness differentially affect 
performance. The mathematical and verbal sections 
of the SAT were also compared for similarities in the 
performance patterns of high scorers. Conventional 
measures indicated that the SAT-M was not differentially 
speeded. However, females omitted a greater proportion 
of items requiring estimation. Different patterns by 
gender were found on the mathematical and verbal 
sections of the test. 

RR No. 90-3 Item No.: 218151 16 pgs 

1990 $15 

Changes in the SAT-Verbal: A Study of Trends in 
Content and Gender References 1961-1987 

Pamela I. Cruise and Ernest W. Kimmel 

Since 1972, the average SAT-Verbal score for men has been 
higher than that for women, although the widely held 
perception has been that women do better than men on 


tests of verbal ability. The purpose of this study was to create 
a detailed history of the content of the SAT-Verbal sections 
over more than two decades and to examine changes over 
time in the content of the test and trends over time in the 
balance of references to and representations of women and 
men within the content of the test. It was found that while 
there had been some changes in the structure of the test, 
none of the changes appeared to affect the basic balance of 
content between those areas thought to favor women and 
those thought to favor men. 

RR No. 90-1 Item No.: 254870 32 pgs 

1990 $15 

Examining the Relationship Between 
Differential Item Functioning and Item 
Difficulty 

Edward Kulick and P. Gillian Hu 

This study examined the relationship of differential 
item functioning (DIF) to item difficulty on the SAT. 
The data comprised verbal and mathematical item 
statistics from nine administrations of the SAT. In 
general, item difficulty was related to DIF. The nature 
of that relationship appeared to be independent of the 
choice of DIF index (either the Mantel-Haenszel or the 
standardized approach) as well as of test form. However, 
the relationship was dependent on the particular group 
comparison and on both the test sections and the item 
type being analyzed. Among other findings, for instance, 
was that Hispanic and black focal groups tended to omit 
differentially less than did the white reference groups. For 
Asian American examples, the reverse held. For females 
and males, the direction depended on the test sections. 

RR No. 89-5 Item No.: 295732 31 pgs 

1989 $15 

Equating the Scores of the Prueba de Aptitud 
Academica and the Scholastic Aptitude Test 

William H. Angoffand Linda L. Cook 
The present study is a replication of an earlier study 
conducted by Angoff and Modu (1973) to develop 
algorithms for converting scores expressed on the SAT 
scale to scores expressed on the College Board Prueba 
de Aptitud Academica (PAA) scale, and vice versa. 
However, some differences in procedures used in these 


two studies are worth noting, and this report contributes 
both in substance and method to the translation and 
equating of tests. The method involved two phases: (1) 
the selection of test items equally appropriate and useful 
for English- and Spanish-speaking students for use as an 
anchor test in equating the two tests; and (2) the equating 
analysis itself. The equating showed definite curvilinear 
relationships in both verbal and mathematical tests, 
indicating in this instance that both sections of the PAA 
are easier than the corresponding SAT sections. The 
results also showed good agreement between the current 
conversions and the 1973 Angoff-Modu conversions for 
the mathematical tests, but not so close agreement for 
the verbal tests. 

RR No. 88-2 Item No.: 217763 18 pgs 

1988 $15 

The Validity of Various Methods of Treating 
Multiple SAT Scores 

R. F. Boldt, J. A. Centra, and R. G. Courtney 

A review of the literature concerned with validity data 
and policies for various methods of treating multiple 
SAT scores is reported, as are analyses of data from the 
College Board’s Validity Study Service. Data from the 
Student Descriptive Questionnaire (SDQ) were cross- 
tabulated with the number of retests by SAT takers. The 
analysis evaluated the use of SAT verbal score alone, 
SAT math score alone, and the use of both scores in 
combination. The methods for treating multiple scores 
were to use the score from the administration with the 
highest combined verbal and math (V + M) score, the 
highest individual score, the most recent score, and the 
average. Best results, in terms of the highest average 
validity, were achieved using V + M. All treatments of 
multiple scores resulted in underprediction of actual 
grades, with the highest score providing the least 
amount of underprediction. However, the discrepancy 
between predicted and actual grades varied greatly 
across institutions. From the data in this study, the 
decision as to which is the most preferable treatment of 
multiple scores seems to depend on how one evaluates 
the discrepancy differences as compared to the validity 
differences. 

RR No. 86-4 Item No.: 275893 8 pgs 
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Four Years Later: A Longitudinal Study of 
Advanced Placement Students in College 

Warren W. Willingham and Margaret Morris 
The Advanced Placement Program (AP) provides a 
means for students to take college -level work in secondary 
school, sit for standard end-of-course examinations, and, 
if successful, be placed ahead with college credit. Using 
data from the Personal Qualities Project, this study 
examined various aspects of the experience and success 
of 1,115 AP students through four years of college. 
AP students, when compared with non-AP students 
matched on six preadmission measures, were found to 
have better academic records and to be more successful 
overall. It was also found that students taking an AP 
Examination in a given subject area were more likely to 
take college course work in that area than students who 
had not done so. 

RR No. 86-2 Item No.: 275892 46 pgs 

1986 $15 

Student Change, Program Change: Why the SAT 
Scores Kept Falling 

William W. Turnbull 

The first leg of the SAT score decline occurred mainly 
in the 1960s, which seemed to be explained fairly 
satisfactorily by the evidence that the composition of 
the test-taking group had changed to include a larger 
proportion of students with relatively low-developed 
ability, mirroring the increased interest in college 
for teenagers. In studies made during the 1970s, no 
comparable underlying change was found to explain the 
second period of the score decline, which was ascribed 
instead to a mix of factors — “pervasive influences” — in 
both school and society. The importance of pervasive 
societal influences on student learning is not in dispute. 
In this study a variety of data suggests, however, that 
the increase in school retention rates of poorly prepared 
students and the resulting heterogeneity of the senior 
high school population is a unifying explanatory variable 
for the second leg of the decline as well as the first. 

RR No. 85-2 Item No.: 217828 10 pgs 

1985 $15 


Considerations for Developing Measures 
of Speaking and Listening 

Donald E. Powers 

The College Board has identified several basic intellectual 
competencies thought to be essential for effective work 
in all fields of college study, among them listening and 
speaking. An issue that arises in connection with these 
competencies is the availability of suitable measures 
to assess students’ development in these areas. This 
report considers the availability and adequacy of existing 
measures of speaking and listening, and discusses a 
number of issues that should be considered in any effort 
to develop new measures of these skills. 

RR No. 84-5 Item No.: 275881 9 pgs 

1984 $15 

Test Disclosure and Retest Performance 
on the Scholastic Aptitude Test 

Lawrence ]. Strieker 

Public disclosure of the content of admission tests, 
originally mandated by legislation in New York and 
now a nationwide policy of many admission testing 
programs, has potentially important consequences for 
the performance of examinees. The aim of this study 
was to evaluate the effect of disclosing an SAT form 
on the retest performance of examinees who had been 
tested initially with the disclosed form and subsequently 
retested with a different form. Access to the disclosed test 
material had no appreciable effects on subsequent retest 
performance — whether that performance was defined in 
terms of the level, stability, or concurrent validity of the 
new scores. 

RR No. 82-7 Item No.: 275867 10 pgs 

1982 $15 


Internal Construct Validity of the Career Skills 
Assessment Program 

Donald A. Rock 

The primary purposes of this study were to examine 
evidence of the construct validity of the Career Skills 
Assessment Program (CSAP) instrument and to present a 
systematic procedure for carrying out internal construct 
validity studies for any testing instrument. 

RR No. 81-10 Item No.: 275860 16 pgs 

1981 $15 

Measurement Error and SAT Score Change 

Donald L. Alderman 

Score changes on admission tests such as the SAT can 
vary widely among individuals repeating the test. To 
a large extent these score changes reflect the influence 
of errors of measurement because test candidates with 
low initial scores usually experience score gains upon 
retesting while test candidates with high initial scores 
often show score losses. This study applied a procedure 
to estimate the true-score change on the SAT adjusted 
for regression effects and student self-selection. It was 
shown that student self- selection in deciding to repeat 
an admission test probably involves factors (in addition 
to the measurement error) attributable to variations 
in aspects of test specifications and to variations in 
responses across forms. In addition, it was found that 
the estimated true-score change remains nearly constant 
across initial score levels in contrast to the negative slope 
of observed score change across initial score levels. 

RR No. 81-9 Item No.: 275859 15 pgs 

1981 $15 
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An Application of Item Response Theory to 
Equating the Test of Standard Written English 

Isaac I. Bejar and Marilyn S. Wingersky 
This reports a feasibility study for using Item Response 
Theory (IRT) as a means of equating the Test of Standard 
Written English (TSWE). The study examined the 
possibility of pre-equating, that is, deriving the equating 
transformation prior to the final administration of the 
test. The three-parameter logistic model was postulated 
and found to portray the data well. The adequacy of 
equating provided by IRT procedures was investigated 
in two TSWE forms. It was concluded that pre-equating 
does not appear to present problems beyond those 
inherent to IRT-equating. 

RR No. 81-8 Item No.: 275858 28 pgs 

1981 $15 

Student Self- Selection and Test Repetition 

Donald L. Alderman 

Student self- selection in deciding to repeat a test was 
examined by contrasting the test performance of students 
taking the SAT as juniors and again as seniors with 
the test performance of students taking the SAT once 
only, as juniors. Residuals of observed minus expected 
test scores revealed statistically significant differences 
between the two groups of students. Results indicated 
that self-selection occurs when students decide to repeat 
a test and that score changes among these students 
reflect negative errors of measurement on the initial test 
administration as well as other factors. 

RR No. 81-5 Item No.: 275855 10 pgs 

1981 $15 


Effects of Different Methods of Weighting 
Subscores on the Composite-Score Ranking of 
Examinees 

Christopher C. Modu 

The effects of applying different methods of determining 
different sets of subscore weights on the composite-score 
ranking of examinees were investigated. Four sets of 
subscore weights were applied to examination results 
for several College Board Achievement Tests. One set 
was determined in advance of the test administration; 
the other three sets were generated after the tests were 
scored. Results showed few differences in weighting 
procedures. The appeal for the set generated in advance 
derives from its time- and cost-saving considerations. 

RR No. 81-2 Item No.: 275852 11 pgs 

1981 $15 
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General Research and Policy 

Common Core State Standards Alignment: 
ReadiStep™, PSAT/NMSQT, and SAT 

Natasha Vasavada, Elaine Carman, Beth Hart, and 
Danielle Luisier 

This alignment study was conducted to demonstrate 
the existing correspondence between assessments in 
the College Board College Readiness Pathway and the 
Common Core State Standards. The College Board 
College Readiness Pathway is comprised of ReadiStep"', 
the PSAT/NMSQT, and the SAT. 

RR No. 2010-5 Item No.: 10b-2901 7 pgs 

2010 $15 

The Development of a Multidimensional College 
Readiness Index 

Andrew Wiley, Jeffrey Wyatt, and Wayne J. Camara 

This report presents a methodology for the measurement 
and tracking of the college readiness level of high school 
students who are engaged in the college admission 
process. The proposed index uses the three distinct 
hurdles of SAT scores, high school GPA, and a newly 
developed measure of academic rigor. 

RR No. 2010-3 Item No.: 10b-3110 25 pgs 

2010 $15 

What Should Students Be Ready For in College? 
A Look at First-Year Course Work in Four-Year 
Postsecondary Institutions in the U.S. 

Emily J. Shaw and Brian F. Patterson 
This study examined the English, mathematics, and 
natural sciences course work taken by students in their 
first year of college. Four-year postsecondary institutions 
(k = 110) provided first-year performance data for the 
first-time, first-year students that began college in the 
fall of 2006. As in previous research, composition is the 
most commonly taken English course. However, calculus 
was more popular than algebra within mathematics, and 
chemistry was more popular than biology within the 


natural sciences, both different findings from previous 
analyses of first-year college course work in those content 
areas. 

RR No. 2010-1 Item No.: 10b-1417 16 pgs 

2010 $15 

Examining the Accuracy of Self-Reported High 
School Grade Point Average 

Emily J. Shaw and Krista D. Mattern 

This study examined the relationship between students’ 
self-reported high school grade point average (HSGPA) 
from the SAT Questionnaire and their HSGPA provided 
by the colleges and universities they attend. The purpose 
of this research was to offer updated information on 
the relatedness of self-reported (by the student) and 
school-reported (by the college/university from the high 
school transcript) HSGPA, compare these results to prior 
studies and provide recommendations on the use of 
self-reported HSGPA. Results from this study indicated 
that even though the correlation between the self- 
reported and school-reported HSGPA is slightly lower 
than in prior studies (r = 0.74), there is still a very strong 
relationship between the two measures. 

RR No. 2009-5 Item No.: lib-3395 13 pgs 

2009 $15 

Testing Accommodations for English Language 
Learners: A Review of State and District Policies 

John W. Young and Teresa C. King 

This report is a review and summary of current 
information regarding test accommodations currently 
used in different states and districts for English language 
learners (ELL). Similarities and differences among states 
regarding ELL accommodation are documented. 

RR No. 2008-6 Item No.: 080482716 13 pgs 

2008 $15 

Stereotype Threat Spillover and SAT Scores 

Michael E. Walker and Brent Bridgeman 
A recent study by Beilock, Reidell, and McConnell (2007) 
suggested that stereotype threat experienced in one 
domain (e.g., math) triggered by knowledge of a negative 
stereotype about a social group in that particular domain 
can spill over into subsequent tasks in totally unrelated 


domains (e.g., reading). The authors suggested that these 
findings might have implications for how the ordering 
of sections on standardized tests such as the SAT or 
GRE could affect examinee performance. To test the 
authors’ assertions, this study used data from a recent 
SAT administration in which either a reading, a math, or 
a writing task preceded a reading task. Performance on 
the subsequent reading task of members of a stereotype 
threatened group (i.e., women) who took the math task 
first was compared to performance of those who took the 
reading or writing task first. Results were inconsistent 
with the stereotype threat spillover hypothesis, and 
serve to justify the exhortation of Cullen, Hardison, and 
Sackett (2004) for caution in generalizing lab findings on 
stereotype threat to operational testing situations. 

RR No. 2008-2 Item No.: 080482549 10 pgs 

2008 $15 

A Historical View of Subgroup Performance 
Differences on the SAT Reasoning Test 

Jennifer L. Kobrin, Viji Sathy, and Emily J. Shaw 

This report presents and reviews gender, racial/ethnic, 
language, and socioeconomic subgroup performance 
differences on the SAT over nearly the last two decades. 
Theories on the existence of subgroup differences 
are examined. Substantial revisions to the SAT were 
made in 1994 and again in 2005; the short-term and 
long-term impact of these revisions on subgroup 
differences is evaluated. Furthermore, the trends in 
subgroup differences on the SAT are compared to those 
documented for other large-scale standardized tests 
(i.e., the ACT Assessment, National Assessment of 
Educational Progress), as well as those found in high 
school grades. 

RR No. 2006-5 Item No.: 060481915 47 pgs 

2006 $15 
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Observational Timing Study on the SAT 
Reasoning Test for Test-Takers with Learning 
Disabilities and/or AD/HD 

Cara Cahalan-Laitusis, Teresa C. King, Frederick Cline, 
and Brent Bridgeman 

The purpose of this study is to provide information on 
actual time used by students with disabilities on the 
new SAT. This study observed students with learning 
disabilities and/or attention deficit/hyperactivity 
disorder (AD/HD) as they took SAT items under strict 
time limits. The study is a replication of study 2 in 
Bridgeman, Cahalan, and Cline (2003), which observed 
students without disabilities completing the same 
test items that are included in this study. There is a 
clear distinction in the mean time spent on the test 
between students without a disability and students with 
a disability. 

RR No. 2006-4 Item No.: 060481884 13 pgs 

2006 $15 

A Portrait of Advanced Placement 
Teachers’ Practices 

Pamela L. Paek, Eva Ponte, Irv Sigel, Henry Braun, and 
Don Powers 

In this study the authors: (1) developed and pilot tested 
an instrument that could be used to document the 
practices of AP teachers; (2) systematically sampled 
AP teachers; (3) administered the final instrument to 
sampled teachers; and (4) summarized the responses for 
each of two subject areas, biology and U.S. history. 

RR No. 2005-7 Item No.: 050481444 41 pgs 

2005 $15 

Researching the Educational Benefits 
of Diversity 

Emily J. Shaw 

Researching the educational benefits of diversity is 
necessary in order to offer evidence to judges, 
attorneys, and policymakers to uphold and support the 
consideration of race in college admissions. This report 
offers several examples of previous studies, as well as 


recommendations and considerations for institutions 
interested in designing and carrying out their own 
research studies on the educational benefits of diversity. 
RR No. 2005-4 Item No.: 050481411 26 pgs 

2005 $15 

What Are the Characteristics of AP Teachers? 

An Examination of Survey Research 

Glenn B. Milewski and Jacqueline M. Gillie 

Information on test-takers collected at the time of 
examinations provides a rich description of AP students, 
but what are the characteristics of their teachers? This 
study provides a glimpse into these characteristics by 
summarizing the results of the largest survey of AP 
teachers to date (32,109 responses). The AP Teacher 
Survey contained 40 questions covering the following 
content areas: classroom characteristics, teacher 

background, professional development, training and 
resource needs, technology, and important issues for AP 
teachers. 

RR No. 2002-10 Item No.: 040481186 18 pgs 

2002 $15 

The College Board National High School 
Survey Report 

Lawrence Maucieri, Renee Gernand, and Thanos Patelis 
This survey, administered in 2000, follows a similar, 
large-scale survey of high schools in 1993 by the College 
Board. The current research report is designed to provide 
the reader with detailed and updated information about 
high schools in the United States. It provides basic facts, 
figures, and trends about common features among 
the participating institutions. The report also presents 
evidence in support of observable trends among the 
educational and demographic factors of concern as of 
2000 and 1993. 

RR No. 2002-4 Item No.: 994516 51 pgs 

2002 $15 


Differential Validity, Differential Prediction, 
and College Admission Testing: A 
Comprehensive Review and Analysis 

John W. Young with the assistance of Jennifer L. Kobrin 
This research report is a review and analysis of all 
of the published studies during the past 25+ years 
(since 1974) in the area of differential validity/ 
prediction and college admission testing. More specifically, 
this report includes 49 separate studies of differences 
in validity and/or prediction for different racial/ethnic 
groups and/or for men and women. The breadth of 
studies range from single-institution studies based on a 
single cohort of several hundred students to large-scale 
compilations of results across hundreds of institutions 
that included several thousand students in all. 

RR No. 2001-6 Item No.: 993362 41 pgs 

2001 $15 

Writing Assessment in Admissions to Higher 
Education: Review and Framework 

Hunter M. Breland, Brent Bridgeman, Mary Fowles 

A comprehensive review was conducted of writing 
research literature and writing test program activities 
in a number of testing programs. The review was 
limited to writing assessments used for admission in 
higher education. Programs reviewed included ACT, 
the California State Universities and Colleges testing 
program, SAT, GMAT, GRE, LSAT, MCAT, and TOEFL. 
RR No. 99-3 Item No.: 217798 40 pgs 

1999 $15 

Eligibility Issues and Comparable Time Limits 
for Disabled and Nondisabled SAT Examinees 

Marjorie Ragosta and Cathy Wendler 
The primary purpose of this study was to establish 
empirically derived testing times for special 
administrations of the SAT for examinees with 
disabilities. A secondary purpose was to establish 
eligibility guidelines for individuals taking special 
administrations. This project used data from test 
administration timing records, the College Board’s SAT 
history file, and a survey questionnaire to investigate two 
issues: comparable testing time (between disabled and 
nondisabled test-takers) and eligibility for special test 
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accommodations for SAT examinees with disabilities. 
Alternatives to the current eligibility policy and their 
implications are discussed, including a change to school- 
based criteria and the use of individualized testing 
programs. The effects of empirically derived testing 
times are also discussed. 

RR No. 92-5 Item No.: 215446 34 pgs 

1992 $15 

The SAT: Four Major Modifications of the 
1970-85 Era 

John R. Valley 

From 1970 to 1985, the SAT underwent major 
modifications caused by: (1) the addition of the Test of 
Standard Written English (TSWE) to the College Board’s 
Admissions Testing Program (ATP); (2) the passage of test 
disclosure legislation; (3) the institution of test sensitivity 
reviews; and (4) the use of item-response theory equating 
in SAT scores. This report discusses these modifications 
as they relate to the SAT’s content, format, development 
procedures, psychometric characteristics, and statistical 
procedures. While the SAT is an instrument that has 
undergone some modification throughout its existence, 
the measurement properties have changed little, if at 
all. The SAT has had, and continues to have, a distinct 
identity. 

RR No. 92-1 Item No.: 215441 29 pgs 

1992 $15 

Young SAT-Takers: Two Surveys 

Gita Wilder, Patricia Lund Casserly, and 
Nancy W. Burton 

The first survey of young SAT takers and their parents 
collected information from a sample of junior high 
students who took the SAT in January 1984 for talent 
search purposes. The students were selected to take the 
SAT mainly through invitations from schools; their 
parents, however, provided the greatest encouragement 
for them to take the test. More than half the students 
reported being invited to participate in one or more 
activities sponsored by a talent search, most commonly 
a summer course on a college campus. Students and 
parents alike evaluated the experience positively. The 
second survey investigated a population of young talent- 


search applicants who took the SAT as seventh-graders 
in 1980-81 and traced the applicants’ test-taking and 
academic history up to their high school graduation in 
1986. The talent-search applicants were significantly 
ahead of the other members of the college-bound cohort 
in more advanced academic areas. The average last SAT 
scores for the 1980-81 talent-search applicants were 
560 verbal and 618 mathematical, about one standard 
deviation higher than for the average college-bound 
senior in 1986. 

RR No. 88-1 Item No.: 217761 44 pgs 

1988 $15 

Students with Disabilities: Four Years of Data 
from Special Test Administrations of the 
Scholastic Aptitude Test, 1980-83 

Marjorie Ragosta 

Every year since 1972 the College Board has issued a 
national report about the Admissions Testing Program 
(ATP) test scores and responses from the Student 
Descriptive Questionnaire (SDQ). This report offers 
the first analogous data for college candidates who took 
special test administrations of the SAT through ATP 
Services for Handicapped Students from 1980-83. 

RR No. 87-2 Item No.: 275897 70 pgs 

1987 $15 

Black Students in Predominantly White North 
Carolina Colleges and Universities, 1986: A 
Replication of a 1970 Study 

Junius A. Davis and Anne Borders-Patterson 

What was it like to be a black student on a traditionally 
or predominantly white campus in 1986? This report 
is a summary of what 22 black student leaders from 13 
predominantly white campuses in North Carolina found 
in exploring this basic question with random samples of 
black first-year students at their institutions. Throughout 
the report, the experiences and perceptions of the black 
freshmen in 1986 are compared with those of their 1970 
cohorts and the types of change that appear to have 
taken place for the affected students and institutions are 
determined. 

RR No. 86-7 Item No.: 275896 23 pgs 

1986 $15 


Advanced Placement Revisited 

Patricia Lund Casserly 

This report describes the first large-scale study to assess 
Advanced Placement Program (AP) outcomes since 
1963. Part 1 of the study examined the validity of AP 
Examination grades as indicators of students’ readiness 
to undertake certain advanced sequent courses as 
freshmen. Part 2 examined the larger role of the AP 
Program in their lives. The study found that overall AP 
candidates who were placed ahead in the field of their 
qualifying AP Examinations did better than a generality 
of upperclassmen in those courses. AP candidates reacted 
very positively to the AP experience in high school 
and to the outcomes of their participation in college. 
Students offered recommendations to institutions that 
wish to foster a smooth transition into challenging 
college careers for future AP candidates. 

RR No. 86-6 Item No.: 275895 14 pgs 

1986 $15 

Uses of the SAT in the University System 
of Georgia 

Cameron Fincher 

The SAT has been required for admission to campuses of 
the University System of Georgia since 1957. Although 
continuing to turn out annual normative data, the 
university system has left research uses of SAT data to 
occasional doctoral dissertations, usually in education, 
or to occasional studies by faculty members. This study 
examines the various uses of the SAT at the University 
of Georgia. 

RR No. 86-5 Item No.: 275894 70 pgs 

1986 $15 

Cognitive Assessment and the Media 

Philip K. Oltman 

The present-day information environment is heavily 
saturated with electronic media. What are the properties 
of these media, and how does massive exposure to them 
affect the cognitive functioning of the audience? These 
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and related issues concerning the media’s influence on 
cognitive functioning are reviewed, and implications for 
cognitive assessment are discussed. 

RR No. 83-1 Item No.: 275869 6 pgs 

1983 $15 

The Role of Academic Ability in High-Level 
Accomplishment and General Success 

Leonard L. Baird 

The relationship of measures of academic ability and 
grades with high-level accomplishment was examined 
by reviewing a wide-ranging literature. In general, the 
studies demonstrated low positive relationships between 
academic aptitude and/or grades accomplishment. The 
closer the content of the measure of academic aptitude 
to the demands of the field, the stronger the relationship. 
RR No. 82-6 Item No.: 275866 39 pgs 

1982 $15 

Student Characteristics and the Use of the SAT 
Test Disclosure Materials 

Marlaine E. Lockheed, Paul W. Holland, and William P. 
Nemceff 

Following the enactment of the New York State 
standardized admission testing law, students taking the 
SAT in New York could request and receive a copy of test 
questions used in calculating their scores, a copy of their 
answer sheet, and various interpretive materials. This 
study examined: (1) the differences between examinees 
who requested these disclosure materials and those 
who did not, and (2) the differences between examinee 
subpopulations in the likelihood of their requesting 
disclosure. Significant differences in both raw and 
adjusted odds-ratios were found. Within each category, 
those most likely to request disclosed materials were 
examinees who were not seeking financial aid for college 
attendance. The likelihood of requesting disclosure 
differed both among different ethnic groups and between 
the March and May SAT administrations. 

RR No. 82-3 Item No.: 275863 26 pgs 

1982 $15 


Abstracts from the Research and Development 
Report Series 1963-81 

Patricia K. Hendel, Editor 

During the period of 1963 through early 1981, the results 
of 120 research and development projects conducted 
by Educational Testing Service (ETS) on behalf of the 
College Board were published in the Research and 
Development Report (RDR) series. That series ended 
with the introduction in 1981 of the College Board Report 
(CBR) series, which includes ETS studies on behalf of the 
Board as well as other reports. One hundred and nine 
abstracts are included; abstracts of the remaining 11 
studies are not available. 

RR No. 82-1 Item No.: 275861 42 pgs 

1982 $15 

Group Comparisons for Basic Skills Measures 

Hunter M. Breland and Philip A. Griswold 
Correlation, regression, and score interval analysis were 
conducted for six academic measures as predictors of 
essay writing and overall performance. Comparisons for 
all analysis were made for men, women, Asians, African 
Americans, Hispanics, and whites. The correlational 
comparisons showed few differences across groups, 
except that correlations tended to be lower for the white 
sample because of variance restrictions. The regression 
comparisons agreed with previous studies, showing 
African Americans and Hispanics to be generally 
overpredicted. On essay-writing performance, men 
were also overpredicted and women underpredicted by 
conventional basic skills measures. 

RR No. 81-6 Item No.: 275856 29 pgs 

1981 $15 
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Validity of the SAT for Predicting First-Year 
Grades: 2008 SAT Validity Sample 

Krista D. Mattern and Brian F. Patterson 

The findings for the 2008 sample are largely consistent 
with the previous reports. SAT scores were found to 
be correlated with FYGPA (r = 0.54), with a magnitude 
similar to HSGPA (r = 0.56). The best set of predictors of 
FYGPA remains SAT scores and F1SGPA (r = 0.63), as the 
addition of the SAT sections to the correlation of HSGPA 
alone with FYGPA leads to a substantial improvement 
in prediction (Ar = 0.07). This finding was consistent 
across all subgroups of the sample, by both institutional 
characteristics and demographics (Ar > 0.06). 

Stat No. 2011-5 22 pgs 

2011 $15 

The Relationship Between SAT Scores and 
Retention to the Second Year: 2007 SAT Validity 
Sample 

Krista D. Mattern and Brian F. Patterson 
This report presents the findings from a replication of 
the analyses from the report, “Is Performance on the SAT 
Related to College Retention?” (Mattern & Patterson, 
2009). The tables below are based on the 2007 sample, 
and the findings are largely the same as those presented 
in the original report. They show SAT scores are related 
to second-year retention. Even after controlling for 
student and institutional characteristics, returners had 
higher SAT total scores than non-returners, by an average 
of 116 points. This held true even within each subgroup 
analyzed, meaning the SAT performance gap is not due 
to differences in the demographic characteristics of the 
two groups. Also, this report finds that differences in 
retention rates by student subgroups are minimized and 
in some instances eliminated when controlling for SAT 
performance. This is particularly noticeable with respect 
to differences in retention rates by ethnicity. 

Stat No. 2011-4 15 pgs 

2011 $15 


Validity of the SAT for Predicting Third-Year 
Grades: 2006 SAT Validity Sample 

Krista D. Mattern and Brian F. Patterson 
This report presents the validity of the SAT for predicting 
two third-year college outcomes: (1) third-year 

cumulative GPA, and (2) third-year grade point average. 
Similar to the results for first- and second-year outcomes 
(1st Yr GPA, 2nd Yr GPA, 2nd Yr Cum GPA), the SAT is 
strongly correlated with third-year outcomes overall and 
by institutional (control, selectivity, size) and student 
(gender, race/ethnicity, best language) characteristics. 
Stat No. 2011-3 27 pgs 

2011 $15 

The Relationship Between SAT Scores and 
Retention to the Third Year: 2006 SAT Validity 
Sample 

Krista D. Mattern and Brian F. Patterson 
Results show that SAT performance is related to third- 
year retention rates. Even after controlling for student and 
institutional characteristics, returners had higher SAT 
total scores than non-returners, and the performance 
gap is not due to differences in the demographic makeup 
of the two groups. Furthermore, while differences in 
retention can be observed between various student and 
institutional subgroups, these differences are minimized 
and in some instances eliminated when controlling for 
SAT performance, especially for higher SAT score bands. 
Stat No. 2011-2 16 pgs 

2011 $15 

Validity of the SAT for Predicting Second- Year 
Grades: 2006 SAT Validity Sample 

Krista D. Mattern and Brian F. Patterson 
This report presents the validity of the SAT for predicting 
two second-year outcomes: (1) second-year cumulative 
GPA (2nd Yr Cum GPA), and (2) second-year grade 
point average (2nd Yr GPA). Similar to the results for 
first-year grade point average (1st Yr GPA), the SAT 
is strongly correlated with second-year outcomes. For 
many significant subgroups, such as ethnic minority 
students and female students, the SAT was in fact a better 
predictor of 2nd Yr Cum GPA and 2nd Yr GPA than were 
high school grades alone. However, for all students, the 


SAT score in combination with high school grades was 
the best predictor of these second-year outcomes since 
both measures provide incrementally validity over each 
other. 

Stat No. 2011-1 30 pgs 

2011 $15 

Validity of the SAT for Predicting FYGPA — 
2007 SAT Validity Sample 

Krista D. Mattern, Brian F. Patterson, and 
Jennifer L. Kobrin 

This report presents the findings from a replication 
of the Kobrin et al. (2008) and Mattern et al. (2008) 
reports. Students who were missing at least one of the 
following were excluded from the analyses: SAT scores, a 
self-reported high school grade point average (HSGPA), 
and a valid first-year GPA (FYGPA); this resulted in a 
final sample size of 159,286. Based on Powers (2004), the 
analytical procedure for computing multiple correlations 
was modified slightly from what was done in the two 
original reports. 

Stat No. 2009-1 20 pgs 

2009 $15 
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You may download copies of these reports from the College 
Board website www.collegeboard.org/research/home or 
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